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PROBABILISTIC  LANGUAGES  AND  AUTOMATA 

Clarence  Arthur  Ellis,  Ph.D. 

Department  of  Computer  Science 

University  of  Illinois,  1969 


The  concept  of  a  probabilistic  language  is  defined  and 
investigated.   The  motivation  for  the  definition  stems  from  the  hope 
of  using  this  tool  to  investigate  programming  languages  and  their 
translators.  A  probabilistic  language  over  a  vocabulary  T  is  defined 
as  a  class  C  of  words  formed  from  T  together  with  a  probability  measure 
on  C.   The  classes  T*  of  finite  strings,  T^  of  infinite  strings,  T  of 

CO 

finite  trees,  and  T  of  infinite  trees  are  considered.   Context  Free 
Probabilistic  Languages  are  characterized  in  terms  of  (l)  Probabilistic 
Grammars,  (2)  Probabilistic  Tree  Automata. 


TABLE  OF  CONTENTS 


1.  INTRODUCTION 1 

2.  BASIC  DEFINITIONS  AND  NOTATION 2 

3.  PROBABILISTIC  GRAMMARS  AND  LANGUAGES k 

h.      PROBABILISTIC  AUTOMATA 9 

5.  CONTEXT  FREE  GRAMMARS l6 

6.  PROBABILISTIC  TREE  AUTOMATA  38 

T.   SUMMARY  AND  CONCLUSIONS 59 

LIST  OF  REFERENCES 6l 

APPENDIX 

A.  APPROXIMATION  OF  PROBABILISTIC  TURING  AUTOMATA 

BY  PROBABILISTIC  PUSHDOWN  AUTOMATA 63 

B.  EXAMPLES  OF  REGULAR  TREE  EXPRESSIONS 72 

VITA fk 


1 

1 .   INTRODUCTION 

In  recent  years,  much  work  has  been  done  on  extensions  of  the 
theory  of  finite  automata  to  obtain  models  of  acceptors  and  translators 
of  programming  languages .   Examples  are  pushdown  store  automata   , 
stack  automata   ,  minimax  automata    ,  and  balloon  automata    .   There 
are  many  others.   The  purpose  of  this  thesis  is  not  to  simply  introduce 
another  type  of  automaton,  but  to  describe  a  general  concept  which  can  be 
adapted  to  any  of  the  automata  present  in  the  literature. 

It  is  quite  natural  to  assign  probabilities  (or  frequencies) 
to  the  strings  of  a  language  to  try  to  get  some  quantitative  measure  of 
"efficiency"  of  grammars  and  translators.   The  model  obtained  by  doing 

this  is  called  a  probabilistic  language,  which  may  be  considered  a 

[27] 
fuzzy  set    ,  containing  all  valid  sentences  of  the  language  together 

with  a  grade-of-membership  function  for  these  sentences .  Acceptors  and 

generators  for  these  probabilistic  languages  are  defined  as  Probabilistic 

Automata  and  Probabilistic  Grammars,  respectively.   Specifically,  Context 

Free  Probabilistic  Languages  are  explored  in  depth  in  this  thesis. 

This  investigation  does  not  consider  how  one  would  find  the 

"best"  grammar  or  automaton  for  a  language,  or  how  to  improve  a  given 

grammar.   Indeed,  the  meaning  of  "best"  is  open  to  many  interpretations. 

The  related  idea  of  finding  good  approximation  grammars  for  languages  is 

also  unexplored.   It  is  hoped  that  the  tools  developed  here  will  lead  to 

quantitative  analysis  in  these  and  other  areas . 


2 

2.   BASIC  DEFINITIONS  AND  NOTATION 
This  section  presents  notation  and  concepts  which  have  been 
previously  defined  in  the  literature  and  are  heavily  used  in  this  paper. 
Then  these  definitions  are  altered  to  form  probabilistic  analogues.   A 
language  over  a  set  T  of  terminal  symbols  is  a  subset  of  the  set  T* 
of  all  strings  over  T.   A  phrase  structure  grammar  over  a  set  T  is  a 
system  (N,  P,  S)  in  which  N  is  a  finite  set  of  nonterminal  symbols,  P 
is  a  set  of  rules  (called  productions)  of  the  form  (¥-»-£)  where  Z,   is 
any  string  of  symbols  of  T  (J  N  (denoted  z,   e(T  (J  N)*)  and  *f  is  any  non- 
empty string  of  symbols  of  T  U  N,  (denoted  f  e(T  UN)).   *  is  called  the 
generatrix  of  the  production,  and  z,   is  called  the  replacement  string. 
S  e  N  is  the  initial  nonterminal. 

Notation:   Hereafter,  when  discussing  languages  and  grammars, 
A,  B,  and  C  will  always  denote  elements  of  N,  while  X,  Y,  and  Z  denote 
strings  over  N.   Similarly,  a,  b,  c  e  T;  x,  y,  z  e  T*;  a  ,  6  »  y  e  TUN; 

Y,  x>  S  e(TUN)*.   If  x  =  a  a. ..a  then  the  length  of  x  is  i(x)  =  n. 


A 


The  null  string  is    denoted  by  \(z{\)   =   0)    and  the   empty  set   is   <f> .      L 

A  A 

always    denotes    a  language,  G   is    a  grammar,    and  A  is   an   automaton.      I 
denotes   the   set   of  positive    integers,   II    the   rationals,    and  R  the   reals. 
Let  G    =    (N,   P,    S)  be   a  grammar.      If  x  B  1    V  ^2   and   (¥  ->   Z,)e   P,    then  we 

write   x  ■*■  *-,    £   V?'      If  ^  strings   r,      £    .  ..£      such  that   5.        -*■  z, .  ,   then 

we  write   z,^  =>  z,      and  we   say  there   is   a  derivation    of  Z,      from  t     with 
On  n  0 

A  A  a  _ 

respect  to  G.   The  language  L  generated  by  a  grammar  G  is 

A  A 

L(G)  =  {x  S  =>  x,  x  e  T*}  .   If  L  is  generated  by  a  grammar  with  all 


A 

productions    (Y  ■+   c)»  V   £  N,   then  L  is   a  context   free   language.      If 

further,   £   is   of  the   form  aB  for  some  a  e   T,  Be  N  U  {A}   in  all 

a  A 

productions   of  G,   then  L  is   a  regular  language. 


k 

3.   PROBABILISTIC  GRAMMARS  AND  LANGUAGES 
Definition:   A  Probabilistic  Language  (P  language)  over  T  is  a  system 

A 

L  =  (L,  u)  where  L  is  a  class  of  words  formed  from  T  and 
y  is  a  measure  on  the  set  L.   If  y  is  a  probability 

-A 

measure,   then   L  is   a  Normalized  Probabilistic  Language 

(NP  language) . 

Definition:      A  Probabilistic  Grammar   (P   grammar)    over  T  is    a  system 

G  =    (N,   P,    a)   where  N  is   the   finite   set   of  nonterminals, 

A,,   A^  , .  .  .    A   ,   A   is   an  n-dimensional  vector,    (6_...6    )  with 
1'      2'  n  In 

5.    being  the  probability  that  A.    is   chosen  as  the   initial 

nonterminal,    and  P  is    a  finite   set   of  probabilistic 

pi  1  +  + 

productions,    ¥.  — ±*»s  . ,  with  V.e(N  U  Tj    ,    z, .   e(N  U  T)    ,   and 

p.      e  R  (p.      J  0).      If  A  is   stochastic,   if  0   <  p   <_  1,   and  if 

-'-J  ■*■  J 

E  p. .  =  1  for  every  generatrix  ¥.  contained  in  productions 

A 

of  P,  then  G  is  a  Normalized  Probabilistic  Grammar 
(NP  grammar) . 
If  all  productions  of  G  are  of  the  form  A-2*  aB  or  A-*»-  a, 

A 

AeN,  BeN,  a  e  T,  then  G  is  called  a  left  linear  P  grammar.   The 
probability  of  a  derivation  of  C   from  t   is  defined  as 

k   k. 
pr(c  *  £  )  =   £   J_[   P.   where  k  is  the  number  of  derivations  of  C 

Pii 
from  £_,  k.  is  the  number  of  derivation  steps,  C.     — "■  &.  .  used  in. 

0     1  !»  J~J-  -1-  » J 

the  i-th  derivation,  and  p.   is  the  probability  associated  with  the 
j-th  step  of  the  i-th  derivation.   The  derived  probability  of  a  terminal 


+  A 

string  x  e  T     with  respect  to  a  left  linear  grammar  G  is   y(x)   = 

n 

I      (6.    pr(A.    =>x))   where  N  =   {An  ,   A., .  .  .A  } ,   A  =    U      60...6    ). 
.    ,    v  i  i  1        2'  n    '  12  n 

The  P  language  generated  by  G  is  L  =  (T  ,  y)  where  y(x)  =  the  derived 

r  o  I 
probability  of  x.   An  admissible  P  grammar  (see  Greibach   )  is  a 

grammar  in  which  there  exists  a  derivation  of  some  x  e  T  from  each 

A  e  N.   A  generalized  admissible  P  grammar  is  one  in  which  there  exists 

a  production  with  A  in  the  generatrix  for  each  A  e  N. 


Theorem  1: 


Proof: 


Every  normalized  left  linear  admissible  P  grammar  G 

generates  a  normalized  P  language. 

Define  an  (n+l)  x  (n+l)  matrix  U  =  [u  ]  as  follows: 


u.  .   = 


u. 


Ij 


I  pr(A.   ^aA.),i£n,j<_n 

a  e   T  1  J 

(A.-^aA   )eP 
J 

E  pr(A    +b),  i<n,  J  sn  +  l 

b  e  T 
(A.—»-b)eP 


u.  .  =  0.        i  =  n  +  l,j<n 

Li  '  — 


u 


"n+l*  n+l 


=  1 


ii 


I  'n+l 


is  by  definition  the  total  probability  of  a  derivation  from  A.  of 


a  terminal  string  of  length  1.   Considering  powers  of  the  matrix  U, 


u. ,    gives  the  total  probability  of  derivation  from  A.  of  a  string  of 

length  <_  k.   If  U  is  pre -multiplied  by  the  row  vector  A  augmented  by 

zero,  A'  =  (6.  5_...6  ,0)then  the  (n+l)-st  element  in  the  resulting  vector 
1  d         n 

represents  the  sum  of  the  derived  probabilities  of  all  x  e  T  >  i(x)  ^  k. 


6 

Finally,        £  y(x)   =     lim         T.  y(x)   =     lim     (A'»u)        .      Since 

5  x  e  T 
Jl(x)    <  k 


x  e   T"  k  -►  °°  x  e   T"  k  -*-  °° 


G  is  normalized,   U  is   a  stochastic  matrix;   and  since  G  is   admissible, 

—  k 

j  k  e   I    >  u.      +-|>0   for  i  =  1,   2...n+l.      Thus,   using  the  theory  of 

*x 

[6"   X  k  k 

Markov  Chains    ,  U  =|  t    J  where  each  row  vector  t.    approaches  a 

tk 
n+1 

steady  state  vector  t   as  k  approaches   infinity,      t  =   (0   O...Ol)V   k  e    I 

implies   t   =    (0   0...0   l)    and     lim     (A'-IT^)   =  A'      lim     (Jp)   =  A'-jtr). 

]^  -»•  oo  k  ~>"  °° 


/t\  n+1 

the    (n+l)-st  element   is    (A'jt    )  =     Z      (6.)   =  1.    QED. 

w    1=1  1 

A  P  language  which  is  generated  by  a  left  linear  P  grammar  is  called  a 
regular  P  language. 

Theorem  2:  There  exists  a  regular  language  L  with  a  probability 
y(x)  assigned  to  each  x  e  L  such  that  no  left  linear 
P  grammar  generates  (L,  u). 

Proof:         The  proof  will  consist  simply  of  exhibiting  such  a 
language . 

(1)  Let  T  =  {a};  then  T  is  the  set  of  strings  {a  |n  e  I}. 

(2)  Assign  probabilities  to  these  strings  y(a   )  =  ,  n  >  0, 


'T 

n 


whe 


2i 


re  t  =  ht   t.  =  smallest  prime  3  t.  >  max  (t    ,  2   )  for  i  >  1. 


(3)  Assign  y(a)  =  1  -  E  — t-"-  •   This  guarantees  that  £  y(a  )  =  1. 

i=l  /  n  n=l 

Next  we  show  that  no  left  linear  P  grammar  generates  the  language 

(T+,  y). 

(1)  Suppose  the  grammar  G  =  (N,  P,  A)  is  alleged  to  generate  (T  ,  y). 
Then  all  y(a  )  are  in  the  field  of  numbers  generated  by  the 
rationals  with  field  extensions  p.  where  p.  is  the  probability- 
associated  with  the  i-th  production  of  P  if  0  <  i  <_  |p|  ,  and  p. 
is  the  probability  6.  in  the  vector  A  if  i  =  |p|  +  j.   This  field 
is  denoted  "fl  (p  .. .p) ,  where  k  =  |p|  +  |n|. 

1      K. 

(2)  If  all  p.    are   in  the  field  II    or  are   algebraic  extensions   of  it, 

then  the  total  extension  is   of  finite   degree.      Consider  the 
extension      ( — e=—.  — z=— » . .  • )  •      This  may  be  written  as   a  union  of 

fields  each  of  which  is   a  finite  extension  of  degree  2  of  the 

previous   field.      Thus,     U       H  ( — ■==-  , — /==->.  ••> — T==~^   is   a  field 

n=l  Al  rz  /n 

whose   degree  must  be  infinite.      Thus   all  of  these  irrationals   cannot 

be  within  the  finite   degree   algebraic   field  extension  1(p    . ..p    ). 

X     K 
A 

Since  all  derived  probabilities  under  the  grammar  G  of  finite  strings 
are  expressible  as  finite  sums  of  products,  these  derived  probabilities 
must  be  within  ""  (p. . .  ,p  ) .   Thus  (T  ,  y)  cannot  be  derived  using  G. 

(3)  If  some  of  the  p.  are  transcendental  extensions,  then  1f(pn...p,  ) 

l  1         k 

can  be  obtained  by  a  pure  transcendental  extension  ^(p.,  • .  .p^)   =  Q 


followed  by  an  algebraic  extension  of  finite  degree  Q(p   ...p  ). 
In  this  case,    —  4   ""(p^+1-  •  -Pk)  implies   .—  4   Q(p£+1. .  .pfc) 

2   1 

by  the  following  argument.   Let  the  polynomial  f(x)  =  x  -  — 

be  irreducible  over  Tf-.  =  1f(p_  ...p.)  but  reducible  over 

l  ^1         l 

^-.i    =   MPijJi  where  p.,,    is  transcendental    (i   <   i) .      Then 
l+l         i  ^l+l  *i+l  — 

f(x)   =   (x  -  a)(x  +  a),   a  e    IF.,  (p.      ).      a  is   expressible   as 

g(Pi+1)  K  g  2       1 

—, r  where  **  is   in  reduced  form  and  not  in   1T   (*■)      =  —  e  R. 

h(pi+1)       h  i-  h     t 

2   1,  2 
g  -  ^h  =0.   But  this  equation  implies  that  p    is  algebraic 

over  1i.  which  is  a  contradiction  =>  <=.      Thus  if  f(x)  is 

irreducible  over  "L  ,  then  it  is  irreducible  over  A.  , ,   This 

i  l+l 

can  be   applied  not  1  but  H  times  to  yield 
r      4    *"  s»     ;  "       t    V(p   , .  .p. ) .      Using  the  previous  part   (2)   of 


this  proof  for  the   algebraic  elements  p„    .....p,  .  we   get 

ft+1         k 


3-  x  3  -7^-  4  «n(p1...pk).     QED. 


k.      PROBABILISTIC  AUTOMATA 

The  idea  of  the  probabilistic  finite  automaton  was  originally 

[17] 
conceived  by  Rabin.      Basically,  if  an  automaton  is  in  some  state  q, 

and  receives  an  input  a,  then  it  can  move  into  any  state,  and  the 

probability  of  moving  into  state  qf  is  p(q,  a,  q').   Rabin  requires  that 

E   p(q,  a,  q' )  =  1  (called  type  1  normalization  in  this  paper)  for 

all  q  in  the  set  of  states  Q,  and  for  all  a  e  T.   Practical  motivation 

for  this  requirement  is  that  these  automata  can  model  sequential  circuits 

which  are  intended  to  be  deterministic,  but  which  exhibit  stochastic 

behavior  because  of  random  malfunctioning  of  components.   Thus  p(q,  a,  q' ) 

is  interpreted  as  the  conditional  probability  of  q'  given  q  and  a, 

pr(q'|q,a),  so  by  the  theorem  of  total  probability,    Z   pr(q'|q,a)  =  1. 

q'  e  Q 

Other  interpretations  may  give  rise  to  other  normalizations.   For  example, 

in  performing  the  state  identification  experiment  with  a  probabilistic 

automaton,  one  might  interpret  p(q,  a,  q' )  as  pr(q,  q'|a).   This  implies 

a  normalization  by  summing  over  all  possible  q,  q'  values. 

E   p(q,  a,  q' )  =  1.   In  fact,  eight  different  types  of 
q  £  Q  q'  e  Q 

probabilistic  automata  can  be  defined  by  the  various  interpretations 
listed  in  the  following  table. 


10 

Normalizations  for  Probabilistic  Finite  Automata 

a,  p 


© 

TYPE  INTERPRETATION  NORMALIZATION 

1  pr(q'|q,    a)  I  p(q,    a,   q' )   =  1  v  q  e   Q,    a  e    T 

q'    e    Q 

2  pr(q|a,    q'  )  E  p(q,    a,    q'  )    =   1  v  q'  ,  va 

q  e    Q 

3  ~pr(a|q,    q»  )  I  p(q,    a,    q' )    =   1  y  q,Vq' 

a  e    T 

k  pr(q',    a|q)  I  E  p(q,    a,    q' )    =   1  y  V. 

a  e   T     q'    e   Q 

5  pr(q,    a|q')  E  E  p(q,    a,    q' )    =   1  v  q» 

a  e    T     q  e    Q 

6  pr(q,    q'|a)  I  T,  p(q,    a,    q' )    =   1  y   a 

q  £    Q     q'    e    Q 

T  pr(q,    a,    q'|)  ZEE  p(q,    a,    q* )    =   1 

q   a  q" 

0  pr(|q,    a,   q*)  p(q,    a,   q' )   =   1         v  q,    a,   q' 

One  of  the  important  theorems  concerning  finite  automata,  which 
was  first  proved  by  Kleene  in  1956     states  that  for  every  left  linear 
grammar,  there  exists  an  automaton  which  accepts  all  and  only  the  strings 
generated  by  the  left  linear  grammar  and  conversely,  there  is  a  left 
linear  grammar  which  generates  all  and  only  the  strings  accepted  by  any 

finite  automaton.   Surprisingly,  an  identical  theorem  was  proved  by 

[3] 
Chomsky  and  Schutzenberger  in  1963    concerning  context  free  languages 

and  pushdown  store  automata.   The  analogous  problems  for  probabilistic 

automata  are  attacked  in  this  paper.   If  the  symbols  a  e   T  are  interpreted 

as  outputs  instead  of  inputs,  then  the  automaton  becomes  a  generator  similar 

to  a  grammar.   In  this  case,  type  k   normalization  must  be  chosen  so  that  an 

NP  grammar  will  correspond  to  an  NP  automaton . 


11 

Definition:      A  Probabilistic  Automaton    (P  automaton)   over  T  is   a 

a 

system  A  =  (Q,  M,  S,  H)  where  Q  is  a  finite  set  of  states, 

S  is  a  finite  set  of  storage  tape  symbols,  -  is  an  initial 
state  vector  and  M  is  a  function,  called  a  probabilistic 
transition  function,  which  has  associated  with  it  a  second 
function  p.   The  specific  nature  of  these  functions  determines 
the  type  of  P  automaton  defined.   If  5  is  a  stochastic  vector 

A  A 

and  if  A  is  constrained  to  some  normalization  type,  then  A  is 
a  Normalized  Probabilistic  Automaton  (NP  automaton).   Cases  in 

A 

which  S  =   <j>  will  be  simplified  to  A  =   (Q,  M,    E).      Particular 
classes   of  automata  are   obtained  by  attaching  constraints   to 
the   general  definition.      The   following  table   lists   some   of  the 
automata  definable,    and  their  range    (i?(M))    and  domain    (£>(M)) 
constraints   on  the  mapping  M,   and  their  normalization   constraints 

Types   of  Automata 

1.  Deterministic  Finite  Automaton 

Norm  Constraints:   Type  1 

RD   Constraints:    D(M)  =  Q  x  T,  i?(M)  C  Q 

2.  Nondeterministic  Finite  Automaton 

Norm:  Type  0 

RD:  D(u)   =  Q  x  T,  2?(M)  C  P(q) 

3.  Probabilistic  Rabin  Automaton 

Norm:  Type  1 

RD:  £(M)  =  Q  x  T,  i?(M)  C  P(Q) 

k.      Probabilistic  Ellis  Automaton  ■ 
Norm:  Type  k 

RD:  D(U)   =  Q  x  T,  R(U)   C?(q)  \j   {X} 
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5.  Probabilistic  Tree  Automaton 

Norm:  TyPe  ^ 

RD:  Z?(M)  =  Q  x  T,  i?(M)  CP(q«) 

6.  Probabilistic  Pushdown  Store  Automaton 

Norm:  Type   k 

RD:  Z?(M)   =  Q  x  S   x  T,  2?(M)   CP(Q  x  S)   U  {Xj 

Note':      P(E)    for   any   set  E   denotes  the  power  set  of  E. 

In  the    case   of  a  type   k  normalized  finite  P  automaton, 

M(q,    a)   =   A    is   used  to   designate  termination.      Any 

q'    e    Q  »  M(q'  ,    a)   =   X   is   called  a  terminable   state.      If 

for  all  q  e    Q,   there  exists   a  terminable   state   q'    accessible 

from  q,    (i.e.,    }  a  sequence   of  states 

q  =   V   qlJ"qm  9  qi+l  £  M(qi'    ai}'   and  X   £   M(qm'    ^ 

for  some   sequence   of  inputs    a^   an    a_...a     e   T    )   then 

0     12m 

A 

A  is    a  terminatable  P   automaton.      A  transition   is    a  change   from 
some   state   q.    e   Q  under  an  input   a  e   T  to  some   state   q.        e   Q 
such  that   q.         e   M(q.,a),    and  will  be  written    (q.  ,a)   ->  q.,-,- 
X   e   M(q.  ,a)   will  be  written    (q.  ,a)   ->  halt.      Associated  with  each 
transition  is    a  probability;    the  product   of  these  transition 
probabilities   is   the  probability  p  of  the   sequence   q     q   . ..q   . 

A  mapping  M(q,    a)    =   cj>   has  probability   zero   associated  with 
it,    and  designates   that   a  transition  out   of  the   state   q 
under  input   a  is    disallowed.      The  probability  of  acceptance 

m 

of  a  string  x  =   a, ...a     is      Z      ^(q^)  p.    p    .   where  m  is  the 
In  .    ,  0        i     ni 

i=l 

number  of  sequences   q     q,...q  such  that  q.   e   M(q*      ,    a.), 

.1  =   1,    2...n-l,    and  X    e   M(q        ,    a   ),    C(q)    is    a  function  whose 
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value  is  the  probability  of  starting  in  state  q,  and 

p  .  is  the  probability  of  the  terminating  transition  from 

the  last  state  in  the  i-th  sequence.   The  P  language 

A    a    .  + 
accepted  by  a  P  automaton  A  is  L=  (T  ,  p)  where 

y(x)  =  the  probability  of  acceptance  of  x. 


Theorem  3: 


Proof: 


Every  finite  P  automaton  accepts  a  P  language  which  is 

generated  by  some  left  linear  P  grammar  and  conversely, 

every  left  linear  P  grammar  generates  a  P  language  which 

is  accepted  by  some  finite  P  automaton. 

(a)  Consider  any  left  linear  P  grammar  G  =  (N,  P,  A)  over 

T.   The  equivalent  automaton  is  constructed  as  follows: 

A  =  (Q,  M,  5)  where  Q  =  N,  H  =  A,  and  for  each  (A.  ■*■  aA  )e  P 

J-  J 

we   define   q.    e  M(q. ,    a)  where  q.    =  A. ,   q1   =  A   -      For  each 

(A.   ■*■  b)e   P,  we   define   X  e   M(q.  ,  b).      The  probability  of 

each  of  these  transitions   is   defined  as  the  probability 

associated  with  the   corresponding  production, 

Pi 
A. *-£..      All  other  transitions   are  of  the   form  M(q,   a)   =  <j> 

and  have  probability  zero.      For  each  derivation  of  x  with 

A 

respect  to  G,  there  is  a  set  of  transitions  which  accepts 

A 

x  using  A,  and  by  construction,  probabilities  are  the  same. 

Also,  each  6.  e  A  is  equivalent  to  £(q.),  so  the  derived 

n  m 

probability  of  x  =  E   6.  pr(A.  •>  x)  =  E   £(q.  )p.  p  .  = 

i=l  i=l 


Ik 

the  probability  of  acceptance  of  x.   Thus,  the  P  language 

A  A 

generated  by  G  and  the  P  language  accepted  by  A  are  the  same, 
(b)  The  construction  of  a  P  grammar  from  an  automaton  is  as 
follows:   If  A  =  (Q,  M,  S),  then  construct  G  =  (N,  P,  A) 
where  N  =  Q,  A  =  5,  and  for  each  q.  e  M(q. ,  a)  add  a 

Pi1 
production  A.  ** aA  to  P;  for  each  A  e  M(q. ,  a)  add  a 

Pi 
production  A.   »  a,  where  p. .  and  p.  are  respectively  the 

probabilities  associated  with  the  corresponding  transitions 
(q.  ,  a)  ■*  q.  and  (q.  ,  a)  ■*  halt.   By  the  argument  used  in 

_L  J  X 

part  (a)  of  this  proof,  the  P  languages  generated  and  accepted 
must  be  the  same. 
Corollary  3.1: 

A 

Every  finite  normalized  P  automaton  A  accepts  a  P  language 
which  is  generated  by  some  left  linear  normalized  P  grammar 

A  AAA 

G  and  conversely,    V  normalized  G,  J  normalized  A  »  L(G)   = 

A  A  A  A 

L(A)  where  L(G)  means  the  P  language  generated  by  G  and  L(A) 

A 

means  the  P  language   accepted  by  A. 
Corollary   3.2: 

A 

Every  finite  terminating  P  automaton  A  accepts  a  P  language 
which  is  generated  by  some  left  linear  admissible  P  grammar 

A  AAA 

G  and  conversely,  V  admissible  G,  3  terminating  A  3-  L(G)  = 
L(A). 
Proof:       These  corollaries  follow  immediately  from  the  construction 
in  the  proof  of  Theorem  3. 


a,  1/3 
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b,  1/3 


Example  1. 
State  Diagram  of  NP  Automaton 


Corresponding  NP  Grammar 
A^aA 
A^aB 
B^bB 


2/3 
B  lii.bC 


cUlcC 


SZi, 


A  =    (5A,    6B,    6C.)   =    (1,   0,   0) 
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5.   CONTEXT  FREE  GRAMMARS 

A 

If  all  productions   of  a  P  grammar  G  are  of  the   form 
(A-^-s),   A  e   N,   p  £   R,  ce(N  U  T)    ,   then  G  is   called  a  context  free 
P  grammar.      The   definitions   of  derivation  and  derived  probability  are 
the   same   as   for  left  linear  grammars   except  that   the   replacement  string 
may  now  consist   of  more  than  one  nonterminal,   so  all  derivations  must 

be  performed  by  operating  upon  the   left-most  nonterminal  at  each  step  to 

.a. 
avoid  undesirable   ambiguities.      The   definition   of  the  P  language   L(G) 

A  A  A 

generated  by  G  is  unchanged;  if  G  is  a  context  free  P  grammar,  then  L(G) 
is  called  a  context  free  P  language. 

Theorem  h:        Every  admissible  context  free  NP  grammar  can  be  transformed 
into  an  equivalent  NP  grammar  in  Chomsky  Normal  Form,  which 
means  all  productions  are  of  the  form  A-»-BC  or  A-*-b, 
(A,  B,  C  e  N,  b  e  T).   Equivalent  P  grammars  are  ones  which 
generate  the  same  P  language. 

Proof:       The  proof  is  a  constructive  one. 

A 

(a)   Given  an  admissible  context  free  NP  grammar  G  =  (N,  P,  A), 

Pit 
we  first  eliminate  ail  production  B. — — **B.  by  constructing 

a  matrix  u  whose  rows  and  columns  are  labeled  by  nonterminal 

symbols  B. .   As  the  element  in  the  B.  row  and  B  column,  we 
1  1  J       ' 

Pij 
take  p.  .  if  B. ^B .  is  a  production  in  G.   Otherwise,  the 
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element  is  zero.   Construct  a  matrix  V  whose  rows  are 
labeled  by  nonterminals  and  whose  columns  are  labeled  by 
the  strings  C.  4   N  which  appear  as  replacement  strings  in 

J 

productions  in  P.   The  element  in  row  B.  and  column  £  is 

-L  J 

p! 

ii  A 

p'      if  there   is    any  production  B. *-C,    in  G.      Otherwise, 

id  i      J 

the  element  is  zero.   u  V  is  a  matrix  with  q . ,    in  the  row 

labeled  B.  and  column  labeled  z, ,   where  q.  .  is  the 

probability  of  a  derivation  B.  ->  B,  -*...-*■  B„  -*■  X, ,   of 

l         Is.  I         J 

length  n  +  1.      Thus 


(  Z     u   )  v  is   the  total  probability  matrix  for  B.    ^>  £..      To 
n=0  x  J 

show  normalization,  we  must  show  that 


(  Z    u     )v  is    a  stochastic  matrix.      This   is   true   for  the 
n=0 

,u  VN 
combined  square  matrix  (--),    and  so  it  is   true   for 

n 
n 


fUV)n  =    £L_ 
{1i}        ^  0 


<i=0  ^ 


),   n  =   1,   2,....      Since      lim    u      =   0, 

n  -*■  °° 


(    Z     u    )v  exists   and  must  be   a  stochastic  matrix. 
n=0 
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(b)  After  all  eliminations   of  type   a  are   completed,   all 
productions  whose  replacement  strings   are  of  length  n(>   2)    can 
be   reduced  to  productions  with  replacement  strings   of  length 
n-1  by  the   following  procedure:      replace  A-^-a   6   ¥  by  A-^-a  D, 
D— M$   V*  where  D  is   a  nonterminal  not   in  N  of  the  old  grammar 
and  a  H  is   a  string  of  length  n,    so  M  has   length  n-1.      By 
repeating  this  procedure,   the  maximum  length   can  be  reduced 
to  2. 

(c)  Replace   all  productions   A-^.a     a     where   at  least  one  a. 

is   in   T  by  A— =i*.B,    B^  where  B.    =  a.    if  a     e  N,    and  B.    is   a  new 
12  l  i  i  l 

nonterminal  if  a.    e  T  with  the  production  B. — ►a.    inserted  into 
i  ii 

the   grammar.      The   same   strings   of  terminals   are   generated  by  a 

grammar  before   and  after  steps   a,  b,   c,   and  d,   and  the   derived 

probabilities   are   unchanged.      Thus,   the  new  grammar  is 

equivalent  to  the  old  because  exactly  the   same   strings  with 

the   same  probabilities   are   generated. 

Theorem  5:        Every  admissible   context  free  NP  grammar   can  be  transformed 

into   an  equivalent  NP  grammar  in  Greibach   Normal  Form-(GNF), 

which  means   all  productions   are  of  the   form 

A^bC,    C....C      (A,C....C     e   N,   b  e   T,   n  >   0). 
12         n        '  1         n  — 
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A 

Algorithm:        Given   any  admissible   context   freeNP  grammar  G  =    (N,   P,   A), 

eliminate   all  productions   of  the   form  A-E*B  by  the  technique 
given  in  the  proof  of  Theorem  k.     Then  define  the   set  of 
handles  of  G  as  H(G)   =   {ot|a  is  the   first   symbol  of  some 

replacement  string  of  P} .      In  this   case,   a   is   called  a 
handle.      M(G)   =   {a|a  is  the   generatrix  of  some  production 
with  handle  in  N} .      The  requirement   for  an  admissible  P 

A 

grammar  G  to  be   in  Greibach   Normal  Form  is  that   r,  must  be   a 
string  of  nonterminals   for  each  production,   A-q^-a^  and 
(l)   T  _)H(G)   or  equivalently   (2)  M(G)   =  <j>.      Note  that  if  any 
8  within   ^  is   a  terminal,   it  can  be  replaced  by  a  new 
nonterminal  C  and  a  production  C-*>3   added  to  P.      Thus 
our  goal  will  be  to  obtain  productions   of  the   form 

A  -*■  aR     ft    ...8     with  each   8.    e    (NU  T).      The  method 

1  2    n  ! 

of  proof  which  is  analogous  to  the  method  used  for 

roi 
nonprobabilistic  grammars  by  Greibach    is  to  employ 

an  iterative  technique  which  generates  at  each  step  a 

A 

new  grammar  G  =  (N  ,  P  ,  A  )  which  has  one  less  A  e  N 

in  the  set  of  handles.  New  nonterminals  are  created, 
but  the  new  productions  added  are  such  that  no  new 
symbols  ever  appear  as  a  handle.   Eventually,  all  handles 
become  members  of  T.   The  construction  and  proof  which 
follow  are  illustrated  by  an  example. 
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T  =    (a,   b),   N  =   {A,    C} 
P  =    {A  H.a,   A^Lca 


A  =   (1,   0)  C-HlibA,    C  i^iACC} 

Example  2. 

For  any  A  e   N    0   M(G),   the   following  procedure  eliminates  A  from  M(G) 
and  from  H(G) : 

(a)      First   construct   a  finite   directed  graph.      One  node   s      is   labelled 
by  A  =  A    .      For  each  node   s.    labelled  A. ,    and  each  production 

A. — isi-A.    3n    B0.  .  .6     where  A.  ,   A.   e  N,    3-.  ,   30. .  .3     e  N  U  T,    (n   >  0) 
l  j      1     2  n  ij  12  n  ^'_ 

create   a  node  s.    labelled  A.    (if  one   does  not   already  exist)    and 
J  J 

create   an  arc  from  s.    to  s      labelled  3,    3«...3    •      For  each 

x     j  x  2    n 

production  A. — i*.c  3-,  3^..  .3  where  c  z   T,  create  a  new  node 
l      1  2    n 

labelled  t  e  I  and  create  an  arc  from  s .  to  t  labelled 

l 

c  3„  3^...3  .   At  times,  nodes  will  be  denoted  by  their 
1  2    n 

labels  since  no  two  nodes  have  the  same  label.   Numbered  nodes 

are  terminal  and  are  not  connected  to  any  other  node.   Each  arc 

is  also  labelled  with  the  probability  p  of  the  corresponding 

production.   Repeat  this  process  until  all  nodes  accessible  from 

A  and  their  arcs  have  been  created.   P  is  a  finite  set  so  this 

process  will  terminate  after  a  finite  number  of  steps . 
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Example  2:   Taking  A  =  A  we  construct  the  following  graph. 


a,  1/2      a,  1/2       bA,  2/3 


(b)  From  this  graph,  a  set  of  productions  is  obtainable.   Put  all 

productions  of  P  into  P  except  those  with  generatrix  A.   Put  the 

productions  of  the  form  A—Vz;  and ^B.-^W,  which  are  created  below  into  Pft. 

For  each  terminal  node  t,  and  for  each  simple  path  from 

s.  to  t,  call  it  s.  s_  s....s  ,  t  =  s  ,  one  can  write  a  production 
0  0   1  2    n       n  e 

n 

A      "i.  r      r        .  .  .  r    ,   q  =     II     p.    where    £.    and  p.    are   respectively 
CT^Si   Si-1        ^1'   ^        .    1    yi  l  *i 

i=l 

the   string  of  symbols   and  the  probability  associated  with  the   arc   from 

s.        to  s.,    i   =   1,   2,...,   n.      In  general,   there  may  be   several   arcs   from 

some   s.    to  s.        so  s      s    ...s      actually  specifies   a  set   of  productions, 

one   for  each  possible  sequence  of  arcs   connecting  the  sequence  of  nodes. 
Note:      For  any   finite   directed  graph,    one   can  always   find  all  simple 
paths   from  any  node  A.    to  any  node  A.  because  one  need  only  consider 

all  sequences   s^   s_...s     with  s^  =  A.    such  that  n  is  no  greater  than 
0     1  n  0  i 

the  number  of  nodes   in  the   graph. 

Consider  a  particular  simple  path  from  s     to  some  terminal  node 

t.     Any  nodes   s.    in  this  path  s^  s,    s^. . .s    ,    (s     =  t),   such  that  there   is 
i  012         nn 

a  path   from  s.    to  s.    not   containing  any  s.  with  j   <   i   is   said  to  fulfill 
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the  loop  condition  with  respect  to  the  given  path.   Define  one  new 

nonterminal  B.  for  each  such  s..  For  each  possible  combination 
1  i  * 

s.,  s.,...,  s  of  one  or  more  nodes  fulfilling  the  loop  condition, 
1   j       k 

and  for  each  sequence  of  arcs  connecting  s  ,  s  ,...,  s  ,  a  production 

r 
must  be  written  of  the  form  A. — ^-C  l,     n  . . .  r    B,  c,  . . .  c    B  C  . . . 

0         n     n-1        ^k+1     k     k  j+1     j      j 

^I+I^a  C.  •••  ^p^i    wnere   ^n    is   "the  label  of  the  arc   connecting  s  to 

sn(Z  =  1,   2....,   n),   B^   is   allowed,  but  not  B    ,   and  r  will  be   defined 
a  0  n 

later, 
(c)      For  each  s.    fulfilling  the  loop  condition,  productions  must  be   created 

to   describe   all  paths    from  s.    to  s..      Erase   all  nodes   s^    of  s^   s.,...s 
^  11  j  0     1         n 

such  that  j  <   i   and  all  arcs   connected  to  these  s..      Then  split  s. 

0  i 

1  2  1 

into  two  nodes,   s.    and  s..      s.    has  those   arcs   of  s.    leading  out,   and 
ill  l 

2 
s.    has  those   arcs   leading  into  s.   with  associated  strings   and 

probabilities   unchanged.      Productions  with  generatrix  B±  must  be 

1  2 

constructed  and  put  into  P_   for  each  simple  path  from  s.    to  s.    using 

the  technique   of  parts    (b)    and  (c)   of  this  proof  with  s.    =  s      and 

2 
s.    =  s    .      These  productions,   which   are   called  B.    -  loop  generating 

productions,   are   constructed  to  allow  looping,,  so  at  the 
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right-end  of  the  right-hand  side  of  each,  B     must  occur  to   allow 

the  loop  to  repeat.      Finally,   for  each  of  these  productions 
B.     »  C  B.    where  p   is   derived  from  part    (b), another  production, 

called  a  B.    -  loop  terminating  production,   must  he   added  to  Pn 

1 
to  allow  the   loop  to  terminate,   B.-£»-  C 

1  X  "  ri 

where  p     =  p( ).      r.    is   defined  below.      This  procedure  is 

r.       i 

l 

recursive  because  steps  (b)  and  (c)  may  need  to  be  repeated  many 
times  if  the  path  contains  many  loops  within  loops.   Furthermore, 
the  whole  process  is  repeated  for  each  simple  path  from  s  to  a 
terminal  node.   The  probability  of  a  production  (A — *.-£)e  Pn  is 

defined  recursively: 
(l)-  If  £  =  C  C  _-|***^l»  "ttien  r  =  0.  which  was  previously  defined. 

(2)   If  L,   contains  new  nonterminals  B.  B....B,  ,  then  £  =  Z,      C   ..... 

i  j    k  n  n-1 

?k+i  \  \  ^k-r-'Vi  Bj  V- -?i+i  Bi— ci«  ma  r  = 

r.      r  r, 

q( )( " — ) ( )  where  r  is  the  sum  of  probabilities 

1-r.   1-r.        1  -  *\ 
i       J  k 

over  all  B  -  loop  generating  productions  of  P  of 
pr(B^  ->  c  B^),  I  =   i,  j,...,  k. 


2k 

A 

G_    for  example   2: 

A^liLbAa,      A  ililLbAaB, 

,   1/2  .    1/10     ,, 

A — L-»-a,  A— - — »-aB, 

1/6 
B — £-*.CCaB   (B-loop   generating  production) 

B »»CCa      (B-loop  terminating  production) 

2/3  1/3 

The   recursion  terminates  because   simple   loops   from  B.    to  B.    are 

sequences   of  length   at  most    |u| .      A  simple   loop  is   a  path  s      s    ...s 

such  that   s_   =  s    ,   and  sn    s_...s     is   a  simple  path.      At  the   second 
On  1     2  n  x       x 

level  of  recursion,   we   consider  simple   loops   emanating  from  some 

node  within   sequences   from  B.    to  B. .      These  may  be  expressed  as   simple 

(2)  (2) 

loops   from  some  B.        to  B.       .      Notice  that  these  paths   do  not  have  B. 
11  l 

in  them,    so  they   are   of  length   <_  |n|    -   1.      Similarly  at  the    (m+l)-st 

level  of  recursion,   simple  paths   are  of  length  _<_  | W |    -  m  because 

nodes  B.  ,   B.       .  . . ,   B.         cannot  occur.      This   is   shown  by  the   following 

ill 

argument.      If  s.  f  s_   then  the   algorithm  erases   the  initial  node,   and  the 
final  node,   by   construction,  has  no  arcs   going  out   of  it,   so  it   cannot 
contribute   to   any  loops ;   if  s.  =   s     then   it  was    constructed  by  splitting 
some  node,    and  so   it  has   no  incoming  arcs   and  therefore  no  loops.      At 
recursion  level    |n| ,    simple  paths    are  of  maximum  length  1.      There   can  be 
no  further  loops,   so  part    (l)   of  this   recursive   definition  applies. 

Lemma  5.1:      The  P  grammar  G     is    normalized. 


Proof: 

Case  1:      Suppose  C  e  N,  C  4   A  .   Then  the  productions  C-E*-5  of  NQ 

A 

are  exactly  those  of  N.   So  "by  the  normalization  of  G, 
the  sum  of  their  probabilities  must  equal  one. 
Case  2:      Suppose  C  e  N,  C  =  A  .   If  there  are  no  loops,  then  a 

proof  by  induction  on  the  maximum  number  of  nodes  in  a 
simple  path  from  A  to  terminal  nodes  can  be  given. 

(a)   Let  n  =  1  be  the  number  of  nodes  minus  one  (i.e., 
maximum  path  has  two  .nodes,  s_  and  s  ),  then  the  following 

diagram  shows  the  situation. 


A        A  *± 

In  this  case,  G  and  G.  have  the  same  productions,  A^ *-r .  , 

0  0    *i 

(b)   Let  n  >  1  be  the  maximum  number  of  nodes  minus  one  in 

A 

the   graph   for  G,    and  let  the   lemma  hold  for   all  NP  grammars 
with  graphs  having  no  more  than  n  nodes   in  every  simple  path 
and  having  no  loops.      There   is   a  one-to-one   correspondence 

.A 

between  productions  A^  ->  £  t,     .  .  .  .  C-,  of  the  P  grammar  G_ 

On     n-1  1  U 

and  derivations  A.   =*>  r,     c,      ,...£.   with  respect  to  *G  such  that, 
U         n     n-l         1 

by  construction,   the  probabilities   are  the  same.      Thus  the  sum 
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A 

of  probabilities   of  productions   for  G_  with  generatrix  A 

|N|-1 
is  Z  pr(A     =»a^)   a        z        p    (  £  pr(A .*>&$))   + 

a  e   T  ■  J=l  e 


C  e(N  U  T)*  5   e(N  \J   T) 


* 


pr(A     -»■  a)   where   all   derivations   are  with  respect 
a  e   T 
5  e(N  t    T)* 

to  G,    and  p  •  =  Z  Pr(An  "*"  A-?)>      Since   there   are  no 

J        C   e(N  SJ  T)*  °  J 

loops,   each  A .   =*•  aX,    corresponds   to  a  path  with  no  more  than 
J 


n  nodes,   so  by  the   induction  hypothesis,  Z  pr(A     =>  az, )   = 

a  e    T  y 

C   e(N  SJ   T)* 

for  j  =  1,  2,...,  |n|-1.   Then  by  the  normalization  criterion, 

the  remaining  sum  equals  one: 

|n|-i 

Z       pr(A  =s>ac)  =   I  Z       pr(A  ->  A  c )  + 

a  e  T  j=UE(Ny  T)*  J 

C  e(N  U  T)* 

Z       pr(A  ■>  a?)  =  1. 

a  e  T 
C  e(N  (J  T)* 

(c)   For  the  most  general  case,  we  use  induction  on  h,  the 

maximum  number  of  nodes  fulfilling  the  loop  condition.   The 

lemma  has  been  proven  by  (a)  and  (b)  for  h  =  0.  Next  assume 

the  lemma  is  valid  for  all  values  1,  2,...,  h-1  with  h  >  0. 

By  omitting  a  node  s.  fulfilling  the  loop  condition  in  the  path 

with  the  maximal  number  of  nodes  fulfilling  the  loop  condition, 
and  by  adjusting  the  probabilities  of  the  remaining  sequences 

A 

of  arcs  out  of  s . ,  the  graph  now  represents  a  new  NP  grammar  G' 
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\onder  the   algorithm  of  Theorem  5 


Graph  for  G1 


a^ 


V  is  the  probability  of  the  j-th  path  from  s  to  s..   u. 


is  the  probability  of  the  j-th  path  from  s.  to  a  terminal, 


m 


(t  in  the  picture  represents  all  terminal  nodes.   In  G1 ,  Z   u.  =  1, 


j=l 


m 


A      —  A 

and  in  G,  Z     u.  +  r  =  1  by  the  normalization  of  G,  and  by  the 
0=1  J 


construction  of  G'  taking  u 


-_A 


'j   1  -  r 


By  the  induction 


hypothesis,  the  productions  of  G'  with  A  as  generatrix  sum 

to  one.   These  are  productions  of  the  following  form 

u.V. 
A  —  J»'lf  .<[),  for  paths  passing  through  node  s.  and  their 

0         J  k  1 

m       £  I 

probabilities   sum  to     Z        Z     u  V     =     Z     V   .      The   analogous 

J=l  k«l     J   k       k=l     k 

A 

productions   of  G     are  of  the   form 

UV  uV(r~) 

A — a-i-Y 4<j>.    and  A^     ^     — ^L+y  B41  a     where   r  is,  by  definition, 
o  jYk  0  -j   jkrk 


m 


1  -  E  u..   Summing  over  these  two  types  of  productions  yields 
j=l  ° 
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£      (u  V     +  u  V   (--£ )\  =     I     V   C   I     u     +  rl  =     E     V,  ,  which 

j=l  k=l       ^  k  J  k     *  k=l     k  j-l     J  w.     k' 


m       £  *  m  & 

r 

m 

2     u. 


A 

is  identical  to  the   sum  for  G' ,   so  the  total  probability  over 

A 

all  productions  of  G  with  generatrix  A  sum  to  one.   This 

technique  can  be  applied  to  other  paths  not  passing  through 
node  s.  so  that  the  above  proof  applies  if  several  paths  have 

the  maximum  number  of  nodes  fulfilling  the  loop  condition. 
Case  3:     Suppose  C  =  B.  e  N  -  N.   By  definition,  the  B.  -  loop 

generating  productions  sum  to  r. ,  and  by  construction  the 

1-14 

B.  -  loop  terminating  productions  sum  to  r.  ( ).   The 

i  l   r . 

l 

two  sums  together  give  a  total  of  1. 

A         A 

Lemma  5-2:   G  and  G  generate  the  same  P  language. 

A  A 

Proof:      G  differs  from  G  only  in  rules  with  generatrix  A,  and  new 

rules.   The  same  set  of  strings  is  still  generated  from  A 
and  other  members  of  N  are  not  affected.   A  has  zero  entries 

for  all  new  nonterminals  so  that  no  new  generations  occur, 
and  the  other  old  nonterminals  retain  their  same  value  as  in 
A.   A   is  a  stochastic  vector  because  A  is  stochastic.   We 
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need  only  show  that  any  derivation  of  a  terminal  string  x 

A       A 

has  the  same  probability  under  G  and  G  .   Consider  any 
derivation  which  corresponds  to  a  path  from  node  An  to  a 
terminal  node,  i.e.,  A  =*  a  C .   The  general  case  can  be 

A       A 

inferred  from  this.  We  claim  that  using  G  or  G  ,  the 

probability  of  this  derivation  is  the  same.   To  use 
induction  on  the  recursion  level  of  the  path  (i.e.,  the 
number  of  simple  loops  within  simple  loops),  first  assume 
path  s_  S....S  is  simple.   The  probability  of  this  path 

A 

using  G  is  the  product  of  the  probabilities   of  derivation 

Pi  Pn 

steps.      If  A.    _ — ^A.    ?.    (0   <   i   <  n) ,    and  A     , — >-  a  X,    , 
l-l  l      l  n-1  n 

n 

A 

then  the  probability  is  II  p..   By  the  construction  of  G  , 

i=l 

there  exists  a  production  A_— ^.a  c,     c;   .,...£,  with 

On     n-1         1 

n 
q  =     II     p.  ,    so  the  probability  of  the   derivation  is   the   same 
i=l     x 

A  A 

using  G  or  G   . 
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Consider  a  derivation  with  recursion  level  m,  i.e.,  loops 
nested  up  to  m  times  "within  other  loops.   Then,  at  the 

m-th  level  of  recursion,  there  are  only  simple  loops 

( m) 
emanating  from  any  B.   .   Suppose  in  this  derivation  the 

B.   node  is  traversed  k+1  times  within  a  B.  ~   loop. 
i  1       v 

A 

The  probability  of  this   using  G  is 

I  m. 

II  t  J  where  there  may  be  many  simple  loops  emanating 
j=l  J 

( m) 
from  B.    and  I   is  the  number  of  different  simple  loops 

traversed,  with  m  as  the  number  of  times  the  j-th  loop 
J 

was  traversed,  and  t  .  as  the  probability  associated  with 

J 

A 

the  j-th  loop.   In  G  ,  there  is  a  production 

B(m) — J^cB(m)  for  the  j_th  loop>   r^  k_th  B(m)  loQp 

is  the  final  one  and  uses  a  loop  terminating  production 
r. 

fmiti(r1)  ri   *    mi 

B:m; — J   1-%  c.      Thus   the  product  will  be II    t  .J. 

1— r      i 

By  the  induction  hypothesis,  the  probability  is  the  same 

A       A 

using  G  or  G  for  any  path  of  recursion  level  m-1,  so  let 
q_  be  the  probability  of  the  path  under  discussion  when  all 

B.    loops  are  removed.   q_(  IT  t  ,J)  is  the  probability 
1  J=l  J 


using  G  of  the  path  obtained  when  the  k  loops  of  B 
are  added.   Under  G  ,  the  addition  of  the  k  loops 

( m     *1    ^      r\ 

means   that   a  production  B.  —+J,      £>      ,  ...C,    must  be 

i  n     n-1  1 

1-rn 
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(m) 


(*_!)   P(i"L) 


replaced  by  B^  =».?_•••?,    B.    C.      ...C^.      The 

A 

probability  using  G     then  becomes 

1-r.  r.  I       m  I       m 

(q.( ~  ))    (t-1—       n     t.J)   =  q     II     t  .^  which  is   identical 

ri     X-ri  3-1  J      j-l  J 

A 

to  the  probability  using  G.   This  result  can  be  used,  and 

the  above  argument  repeated  to  add  further  B^   loops,  and 

J 

A       A 

show  that  the  probability  using  G  or  G  is  the  same  for  any 
path  of  recursion  level  m. 

A  A 

Several  things   can  be  mentioned  about  G    .      AX  M(G   ),   i.e., 

■p  A 

all  productions  A-*-£   have  handles   in  T;   N     n  H(G    )e   N,    i.e.,   no  new 

symbol  is   a  handle.      If  A  is   a  handle   of  any  productions   C— *-A  ¥,   then  the 
following  substitution  changes   this  handle  to  a  set  of  terminals   as  handles 
Suppose  the  productions  with  generatrix  A  in  P     are 

Pl         P2  Pn  q 

A — »-a,  C-,  »  A — >-a^  ?„»...«  A »»a  £  .   Then  C  -VA  C  can  be  replaced  by 

112     2  n     n 

P    Ql  PoQ-  PI  A 

C— i^a,    I.    V,   A         »a0   £0  V,...,    C— ^-*-a     c      *•      Thus  A\  H(Gn). 
11  2     2  n     n  ^  0 

Final  G  for  example  2: 

1/3 
The  P  grammar  contains  C — i~*-ACC  with  handle  A.   Using  the 

productions  with  generatrix  A,  we  replace  this  production  by 
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1/9                        !A5 
C  -±i4*bAaCC,      C  *bAaBCC 

ciV^aCCs  c^-^aBCC. 


Proof   (of  Theorem  5): 


Let   G     =  G  =    (N,   P,    E).      Consider  G.  .      If  N  n  M(G.  )   =   <J>, 


we   stop,      if  not,   the   algorithm  is   used  to  find 

A  A  ,  v  ,  A     v  M  A 


'  =  Gi+1  =  (Ni+l'  5i+l}  such  that  L(Gi)  =  L(Gi+l}' 


M(Gi+1)n  N  CM(G.),   N.+1  n  H(G.+1)C  N,   N. +]_  0   H(g\+1)  C 

A  A 

M(G        ).      At  each   step,    one  member  of  M(G.       P   N  is 

eliminated  and  new  members   are  never  added.      No  new  non- 
terminal symbols   ever  become  handles.      Thus,   since  N  is 
finite,   we  eventually  reach  an  n   such  that 
H(G   )n  N  M.      Consider  M(G    ).      If  M(G    )   =   <j> ,  we   are 

A 

finished.      Suppose   C  e  M(G    ).      Then  there  exists   a  production 

n 

C-^*»A  Q.      By  construction  A  cannot  be  a  new  symbol,  i.e., 

A  A  a 

A  e   N.      Then  A  t  M(G    )    since  M(G)n    N  =   4>   and  A  e   H(G    ); 

n  n  n 

A  A 

but  by   construction  H(G    )fl  N  c  M(G    ).      This 
J  n  —         n 

A  A 

contradiction  implies    C  i  M(G    ):   thus  M(G    )   =   <f>   and 

n  n 

A  A 

H(G    )  C  T,    so  G     is   the   sought   after  P  grammar, 
n     —     '  n  D 

A 

Theorem  6:    There  exist  context  free  languages  L  with  probabilities 

A 

y(x)  attached  to  strings  of  L  which  cannot  be  generated  by 
any  context  free  P  grammar. 
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Proof:       This  theorem  is  the  context  free  analogue  of  Theorem  2. 
The  example  and  proof  given  there  are  still  valid  if  the 
P  grammar  is  generalized  to  be  any  context  free  P  grammar. 
The  context  free  extension  of  Theorem  1,  however,  is  false. 
The  following  example  of  a  normalized  P  grammar  which 
generates  an  un-normalj zed  P  language  was  suggested  by 
D.  E.  Muller. 

A^AA, 

Ama 

Example  3. 
This  NP  grammar  generates  a,  1/3;  aa,  2/27;  aaa,  8/243; and 

oo 

the  total  probability  is   E   (a  )  =  1/2. 

n=l 

Theorem  7".    There  exist  context  free  NP  grammars  over  T  which  do  not 

generate  NP  languages  over  T*. 
Proof:       First  a  general  criterion  for  an  NP  grammar  to  generate  an 

NP  language  will  be  developed.   Then  the  proof  of  the  theorem 

is  simply  to  observe  that  example  3  does  not  fulfill  the 

criterion. 

A 

Let  G  =  (N,  P,  A)  be  an  admissible  context  free  NP  grammar. 

By  theorem  4,  G  can  be  put  into  Chomsky  normal  form  (A.  -*•   A  A  or 

l    j  Tc 

A.  •*■   a) .   For  any  particular  A.  define  a  matrix  Z.  with  entries 
i  *  *  i  i 

p.    =  pr(A.  -*■   A.  A,  ).   Also  define  a  column  vector  Y(t)  with  entries  y.  (t)  = 

1 JK  1      J    K.  1 

the  probability  of  derivation  of  a  terminal  string  within  t  derivation  steps 


3k 


given   a  starting  nonterminal  of  A., 


y.(i)   =  q.    =  pr(A.   ■+  a).      The   total  probability  of  a  derivation 

1  a  e   T  1 

|i| 

of  length  <   t  is  a(t)   =     E      6.    y.(t).      A  derivation  A.    =>  x  of  length 
—  .    ,       1      x  1 

i=l 

less  than  or  equal  to  t+1  can  be   obtained  by   (l)    a  production  A.   ■+  a 
or   (2)   A.   ■+  A.   A^   and  both  A.    and  A    yield  derivations   of  length  at 
most  t.      Thus  we   can  write  the  equation  y. (t+l)   = 

|n|    |n| 

E   E  r>.   y  (t)  v,  (t)  +  q..   In  matrix  form,  this  is 
j=l  k=l  xJk  J     k 

y±(t+l)  =  YT  (t)Zi  Y(t)  +  y±(l). 

Lemma  7.1:   y.  (t)  <_  1,  i  =  1,  2,...,  |n|  implies  y.  (t+l)  <_  1, 

i  =  1,  2 , .  . . ,  j  N|  .   Assume  y .  (t )  <_  1  for  all  i  ,  for  some  t . 

Then  y  (t+l)  =  YT  (t)Zj,  Y(t)  +  yi(l) 

<  (1,  1,...,  1)  Z.  (1,  1,...,  1)T  +  q. 

=  E  E  p.    +  y.  (l).   Thus  y.  (t+l)  <_  1. 
J  k  ljk    X  X 

Lemma  7-2:   y.  (t+l)  >_y.(t) 

This  is  easily  shown  by  induction  on  t. 
Case  t=l:    y.(2)  =  YT  (t)  Z±   Y(t)  +  y±(.l)  >y.(l). 

Case  t=m  >  1: 

Suppose  y  (t+l)  >  y.(t)  for  t  =  m-1.   Then 
i      —  l 
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y.  (m+1)   =   I   I   p.  ,.    y  Am)   y    (m)    +  y.  (l) 

y.(m)   =    j  j  p         y^m-l)   ^(m-l)    +  y^l). 

Each   component  y    (m)    >_y    (m-l), 
J  J 

y    (ra)    >_y    (m-l)   by  the   induction  hypothesis,   so  the   sum 

is   greater,      y.(ra+l)    >  y.(m). 
i  —    i 

Lemmas    7-1   and  7.2   show  that  y.(t)    must   approach  some    fixed  point   as    t 

approaches    infinity,    (a  bounded,    monotone   increasing  sequence   converges) 

T 
and   (l,    1,...,    l)      is    a  fixed  point  vector.      Suppose      lim     Y(t)   =  Y; 

t  -*■  °° 
then  Y  must  be   a  fixed  point  vector.      This   means 

y .  =  YT  Z.  Y  +  q.  » 


|N|  |N|  |N| 

yi  =  l        E  yi  Piik  yk  +   Z  yi  piii  yi 

1    j=l  k=l   J  1JK  k   j=l   J   ljl   * 

j^i  k^i  j?i 


|N| 
+   *  yi  Piik  yk  +  yi  Piii  yi  +  qi   

.K—J_ 

k^i 

2  lNl  lN|  |N| 

yi  (Piii}  +  yi  (4  y£  (piU  +  Pi£i}  -  1}  +  {.l±   kl±   Pijk  yj  yk  +  V  =  ° 
J#i  J*i  k^i 

Solving  this  quadratic  for  y.  yields 


JN 
1   - 

2,-1  /   V- 

l?i  /    Jl^i 


e    y     (p..     +  p.    .)  +    /   e    y     (p..     +  p.    .)  -  lV 


Yi  = 


,NI    lNl 
-  ^p,,,('z       E     p,  ,v  y,  yv  +  q,) 

Jfo   k^i 


111  1=1  k=l  ijk   j   k 


2p.  .  . 
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or  if  p. . .  =0  then  a  simpler  linear  equation  results. 
111 


T 
This  system  of  |N|  equations  has  a  solution  Y  =  (l,  1,...,  l)  . 

|n|       |n| 

This  solution  implies  lim  a(t)=  £  5.  y.  =  E  6.  =1. 

.  .   -,     1    1       .   ,     1 

t  ">■  °°  1=1  1  =  1 

A 

Criterion:   An  NP  grammar  G  generates  an  NP  language  iff  the  equation  has 
no  solution  vector  smaller  than  (l,  1,...,  l)  where  a  smaller  solution 
vector  Y  means  yy.,  0  <_  y.  £  1  and  3  y.  3  0  <_  y.  <  1  (i  =  1,  2,...,  |N|). 

1*1 
In  case  this  Y  solution  exists,   lim  o(t)=  E   6.  y.  <  1  is  the 

,  ...   l  i 

t  -*-  oo  1=1 

A 

probability  of  a  terminal  string  being  generated  by  G  since  the 
smallest  solution  is  always  the  one  approached.   It  can  be  shown  that 
this  is  the  case  by  the  following  argument. 

Since  both  Z.  and  Y  must  have  all  non-negative  components, 

(because  G  is  normalized)  y.  =  Y   Z.  Y  +  q.  >  q.  for  any  solution  Y. 

l       i      l—i 

Replacing  the  unit  vector  (l,  1,...,  l)  by  the  smallest  solution  vector 

Y  in  lemma  7-1  yields  a  proof  that  y.  (t)  <_  y.  for  all  t;  so  y.(t)  t  y.  . 

2/3       1/3 
In  the  example  A — L-»-AA,  A *a,  we  have  A  =  A,  q  =  1/3, 

Plll  =  2/3>   S° 


_  ^-Sii  qi  _  l±/L-U(2/3)(l/3)    solutic 


UJ.ons  are  1,  1/2.   Thus 

2p...  2(2/3) 

in 

|n| 

y(an)    =     Z      6.    y.    =   1/2.      This   example,   therefore,    does   not   fit   the 
n=l  i=l     X      1 

criterion.      Furthermore,  by  solving  this   equation  for  the  more   general 
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_  1  +   /l-Up  +  Ur> 
lea    p...    =  p,    and  q.    =  q  =  1-p,  we   get  y\ = — — '—  t 

Solutions   are   1,    q/p.      Tims   A-^-AA,   A -i-a  yields 

00 

E      u(a    )    =   q/p   if  p   >_  q   and   1   if  p   <   q. 
n=l 

The  significance  of  this  result  is  that  P  grammars  may  have 
nonzero  probability  of  an  endless  cycle  of  generation.   This  leads  to 
the  concept  of  nonterminating  P  grammars  which  generate  infinite 
strings.   Define  T  as  the  set  of  all  infinite  strings  of  symbols  of  T. 
Definition:   A  generalized  P  grammar  G  is  strictly  nonterminating  iff  (l). 
it  is  in  Greibach  normal  form;  (2)  it  contains  no  productions 
of  the  form  A-=-*-a  (these  are  called  terminating  productions). 

A  strictly  nonterminating  P  grammar  must  be  considered  to 
generate  strings  of  T  .   By  introducing  tree  automata  in  the  next  chapter, 
an  analogue  for  context  free  P  grammars  of  theorem  3  will  be  proven.   The 
following  theorems  will  be  an  immediate  consequence  of  the  theorems  for 
tree  automata. 

Theorem  8:    Every  strictly  nonterminating  left  linear  NP  grammar  generates  a 
P  language  (T  ,  y)  such  that  y  is  normalized,  (i.e.,  y  is  a 
probability  measure). 

Theorem  9:    Every  strictly  nonterminating  context  free  NP  grammar 

generates  a  P  language  (TT ,  y)  such  that  y  is  normalized. 
It  will  also  be  shown  that  P  grammars  which  generate  strings  over 
T  U  IT    can  be  considered  as  a  sub-class  of  P  grammars  over  t  . 
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6.   PROBABILISTIC  TREE  AUTOMATA 
Definition:   A  tree   is  a  set  D  C  I*  where  I*  is  the  set  of  all  finite 
strings  of  positive  integers  and  where  D  satisfies  the 
following  three  requirements: 
Let  d  =  (n  n  . ..n  ),  n.  e  I,  then  dn  =  (n  n...n  n),  n  £  I, 

X   £_      K.      1  X   d.  iC 

1.  dn  e  D,   n  >  1,  =>  d(n-l)e  D 

2.  dn  e  D,   n  =  1,  =>  d  e  D 

3.  If  set  {n|dneD,neI}^<f),  then 

max  (n)  <  °° 
dn  e  D 


Not 


e:   (n  . ..n  )  in  the  case  k  =  0  is  the  empty  string  A  which  is 

considered  as  the  root  of  the  tree.   Each  d  e  D  is  a  node  of  the  tree 

and  if  n  =   max  (n)  then  node  d  goes  down  into  n  other  nodes 
dn  e  D 

dl,  d2,...,  dn  ,  from  left  to  right: 


+  + 

If  {n|dn  e  D}  =  <j> ,  then  n  is  defined  as  n  =0.   In  this  case,  d  is  a 

terminal  node  of  D. 

Definition:   A  valued  tree  over  a  finite  alphabet  T  is  a  pair  (v,D) 

where  D  is  a  tree,  and  v  is  a  function,  v:D  -*■   T.   Valued  tree 

will  sometimes  be  abbreviated  to  tree  when  there  is  no 

possibility  of  confusion  with  the  previous  definition  of  tree, 

Define  A(n  n  . . .n )  =  k.   The  length  of  a  tree  is   sup  £(d). 

d  £  D 
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Define  the  m-th  level  of  a  tree  to  be  the  set  (d  c  D|H(d)  =  m).   If 

( 3  m  e  I)(V  d  c  D)  U(d)  <  m] ,  then  (v,D)  is  called  a  finite  valued  tree. 

Define  a  subtree  of  (v,D)  to  be  (v*  ,  D1 )  where  D'  C  d  is  itself  a  tree, 

and  the  function  v1  is  v  on  the  restricted  domain  D' ,  v'  =  v|D.   The 

set  of  all  finite  valued  trees  over  T  is  denoted  by  T  .   If  d  e  D,  *dl  e  D, 

then  (v,D)  is  a  full  infinite  valued  tree.   T  denotes  the  set  of  all  full 

infinite  valued  trees  over  T. 

Definition:   A  Probabilistic  Tree  Automaton  (PTA)  over  T  is  a  triple 

(Q,  M,  s)  where  Q  is  a  finite  set  of  states,  5  is  the 

initial  state  distribution  vector,  and  M  is  the  next  state 

function,  M:  Q  x  T  ■+  P(Q*).   Associated  with  each  q  e  Q, 

a  e  T  is  a  function  p     3  p(p    )  =  M(q,a),  i?(p    )  =  R. 

*q  ,  a     -q,a      *       *q,a 

(Transition  probability). 

An  elementary  transition  x   is  a  change  consistent  with  M 

K 

from  state  q  under  input  a  to  a  sequence  of  st?tes  &  =  q   %.0*  •  • 

q   .   (Consistent  with  M  means  0  e  iM(q,a)).   Thus  a  transition  out  of 

any  state  may  go  into  many  states  rather  than  into  a  single  next  state. 

The  nonzero  probability  of  this  elementary  event  is  p(i  )  =  p    (Qi,)« 

k     q }  a   K 

A  probabilistic  tree  automaton  is  normalized  (NPTA)  iff  for  all  states  q, 

the  total  probability  of  leaving  q  is  one   (i.e.,    £       Z  p    (On)  = 

a  e    T     QL     e   M(q,a)      q,a 

l),    0   <   p  (0   )    <   1,    and  5   is    a  stochastic   vector    (i.e., 

q,a        k     — 

£        C(q)    =   1  where    ^   is   a  function:    Q  ->■  R  which   assigns   to  each   state   q,   its 

q.  e   Q 

proper  probability  value  from  the  vector  E). 


Definition:   A  PTA  is  complete  iff  (Vq.  e  Q)(3  b  e  T)  [M(q,b)  4   0], 
Thus,  complete  means  there  is  always  a  positive 
probability  of  existing  from  any  state  q. 

Definition:   A  PTA  A  is  strictly  non terminating  if 

(.1;  A  is  complete. 

(2)   (V  q  £  Q)(V  a  e  T)[X  f.   M(q,a)]. 
Any  complete  PTA  can  be  altered  to  form  a  strictly  non- 
terminating  PTA  by  adding  a  new  state  q  if  there  are  elementary 

transitions  X  e  M(q,a),  and  replacing  these  by  elementary  transitions 
q  e  M(q,a)  with  probabilities  equal  to  that  of  the  omitted  transitions. 

Next  add  a  new  blank  symbol  b  to  the  alphabet  and  the  transition 


M(qt,  b)  =  {qt},  p      (qt)  =  1. 


It  is  useful  and  convenient  to  represent  these  tree  automata 
using  a  modification  of  the  directed  graphs  usually  used  for  state 
diagrams.   A  context  free  state  diagram  consists  of  a  set  of  vertices, 
represented  by  small  circles,  each  of  which  may  be  connected  to  others 
oy   incoming  arcs  (lined  with  arrowheads)  and  outgoing  cables  (heavy 
lines).   The  vertices  denote  states,  and  cables  denote  possible  elementary 
transitions.   Each  cable  is  labelled  with  the  input  symbol  (e  T)  which 
produces  the  elementary  transition  and  the  probability  associated  with 
the  transition.   Each  cable  splits  into  one  or  more  arcs  each  of  which  goes 
to  a  state.   The  initial  states  (all  q  3-  £(q)  ^  0)  are  designated  by 
incoming  labelled  arcs  with  no  source  states,  and  no  inputs.   In  the  case 
of  a  terminal  transition,  X   z   M(q,a),  the  automaton  literally  goes  into  no 


fcl 


state    (it   stops).      In   drawing  state   diagrams,   this    can  be   represented 
by   drawing  an   arc   from  the   vertex  q  to   a  dead-end  vertex  q     which  has 

no   cables   out   of  it   and  which   does   not   correspond  to   any   state   in  Q. 


This   is    a  state    diagram  of  the  automaton 

Q  =  <v  V' 

5  =    UUA),    CUB))    =    (1.   0), 

M(qA,a)    =    {qA,    qA   qfi}  ,    M^b)  =    {qB,    qA   q^  , 

PqA,a    V    =   Pa'    PqA,a    N   V  =   Pa' 

PqB,b    V    =   V    PqB,b    N   V  =   PV 
This    automaton   over  T  =   {a,b}    "accepts"   among  others   the   following  trees 


Tree   1, 


Tree   2, 


a 


D1  =   {1   |n  >   0} 


(d)    =   a   v  d  e    D 


a 

a/Xb 

a/Xb     V 

/  \     \     \ 

A  \  \  \ 


D2   =   DiU{l11   2    ^l11'    m  -  0} 


v2(d)    =   a  if  d  e   Dx 


v   (d)   =  b  otherwise 


Example   5 


U2 

A  . 

Definition:      A  run  of  A  on  a  tree,    denoted  r(.v,D),   is   a  function  r: 

D->Q3VdeD,    (r(dl),...,    r(dn+))    e  M(r(d),   v(d))  where 
(a)    5   (r(A))   /0  and   (b)   if  n+  =  0  then   \   e  M(r(d),   v(d)).      If 
condition   (b)   is   omitted  in  the   case   &(d)   =  length  of  the 
tree    (v,D)    <  »,   then  r(v,D)    is   a  p re-run.      The   set  of  all 
runs   on    (v,D)    is    denoted  by  Rn(v,D). 
A  transition  t    is    defined  as   the  sequence   of  elementary  transitions 

T      used  in   a  run  to   go   from  some   level  n  of  a  tree   to  level  n+1.      The 

probability  of  this  event  is  p(t)   =  n       p(0-      The   sequence 

of  transitions   used  in   a  run  is   denoted  by  t(r).      The 

response  function  of  a  run   (or  pre-run)   r  is 

defined  as   the  product  of  the  probabilities   associated  with  transitions 

used  in  the   run    (    or  pre-run)    and  the   initial  state  probability 

rf(r)   =   g(r(A))        II  p(t). 

t  e   t(r) 

Definition:      A  k-prefix    of  a  tree    (v,D)    is   the   subtree    (v,  ,   D.  )   where 
— * IS,        K. 

D     =   {x  e   d|&(x)   £  k}    and  v     =  v|D    .      In   defining  acceptance 

of  trees,    it   is    desirable   to   accept   a  prefix  as   part   of  a 
longer  tree  without   requiring  the  machine  to  terminate. 
Thus  pre-runs   on  prefixes   are  necessary,    and  likewise  the 
following  definitions:      An  elementary  pre-transition  t 
out   of  state   q  under  input   a  is   the   set  of  all  elementary 
transitions   from  state   q  under  input   a.      The  probability  of 

this  event  is  p(i)  I  p        (Q^.).      If  M(q,a)   =  <j>, 

^  g   M(q,a)      q'a     k 


then   p(x)    =  0.      The  final  transition    t  of  a  pre-run   on 

(v    ,    D   )    is   the   sequence   of  elementary  pre-transitions   used 

K.  K 

to  leave   the  k-th  level.      The  probability  of  this   event 

is   p(t)   =  IT        P^O* 

t      e   t 
k 

Definition:      The    behavior  of  A  is   the   set   of  all  trees    (v,D)    over 

T  3  3    at   least   one   run   of  A  on    (v,D),   B(A)    =    {(v,D)| 

Rn(v,D)    ?    <f>}.      The   k-behavior  of  A,   B    (A)  ,    is   the   set  of 

"""  K. 

all  k-prefixes    (v    ,    D    )    over   T. 

K  K 

Definition:   The  probability  of  acceptance  of  a  tree^,  y(v,D)  is  the 

sum  of  the  response  functions  of  all  runs  on  (v,D), 

y(v,D)  =    I  rf(r). 

r  e  Rn(v,D) 

Define   the  k-probability   of  a  tree   as   y,  (v,D)   =  y(v    ,    D   )   = 

E  rf(r)   where  RriAV    ,    D    )    is   the   set   of  all  pre-runs   on 

r  6   Rn(vk,    Dk)  k       k 

1    k 

(v    ,    D    ).      Two  trees   over  T  are    defined  to  be  k-equivalent .    (v    ,    D    )  = 

K.  K 

2         ?  1  ?  12 

(v   ,   D   )    iff  D,    =  D.     and  v     =  v  . 

k  k  k  k 

Theorem  10:      =   is   an  equivalence   relation. 

Proof:  It   is   easy  to  verify  that  k-equi valence   is    reflexive, 

symmetric,    and  transitive. 


Theorem 


Proof: 


11:       (v1,    D1)  =       (v2,    D2)    =>y,  (v1,    D1)    =   n    (v2,    D2)    over  T 


k 


12  12  11 

Let  D,    =  D,    =  A  ,    and  v.    =  y     =  v,  .   then  y    (v    ,   D    )   = 
k  k  k'  k  k  k'  k 


Theorem  12:      If  A  is   a  normalized  PTA,   then  £  u(v.  ,   D   )   = 

(vk,   Dk)    £   Bk(A)  k       * 

1,  k  -  0,   1,   2,... 

Proof:  Let   £   rf(rk)    denote  E  rf(r)   = 

r  e  Rn(vk,   Dfc) 

(vk,  Dk)  E  Bka> 

Z  y(vv'   Di  )•      ^e  Pro°f  is  'by  induction  on  k. 

<vVeV*> 

(1)  Case  k=0:      For  each  pre-run,   there   is   only  a  single  transition  which 
is   a  final  transition   consisting  of  one  elementary  pre-transition 
because  the  0-th  level  of  a  tree   is   only  one  node,   A.      Given  any 

q  e   Q,    suppose    3m  inputs   a.    e   T  3    M(q,a.)  ^   <J>.      Then    3  m  prefixes 

a.,  ,    a^,...,    a     which  each  have   a  pre-run  of  the   form  r:    A  ■+  q, 
12m 

and  for  the  tree   a. ,   the  response   function  is 

£(q)  I  p  (Q.).      The  probability  of  these  m  response 

q,    e   M(q,a.)      q>ai      J 

functions   is  E        £(q)  E  P  (Q.)  which  by 

a.    £    T    '  Q     £   M(q,a.)      q'ai      J 

1  J  J- 

normalization  of  A  reduces  to  £(q).   Finally,  summing  over  all 

states  q,  we  get   E    £(q)    E       E         P    (Q.)  =   E   ^(Q.)  1 

q  e  Q      a.  £  T  Q  e  M(q,a. )   q'ai   J    q  e  Q 
l       j        i 

by  normalization  criteria. 

(2)  Case  k  >  0:   Assume  E      .  y(v,   ,  D   )  =  1.   The 

A 

set   of   (k-l)   -  prefixes   of  trees   in  B(A)   partitions   the   set  of 

k-1 
k-prefixes   via  the   equivalence   relation  ==.      This   is   indeed  a 

partition  because  each   (v    ,   D   )   has   some    (.Vj.,   Dk_]_)    as  prefix,   so 


h5 
v.    includes  all  prefixes  (v  . 

X  K.  K 


U    E.    includes   all  prefixes    (v,_,    D,  )   where   E4    is   an 


i=l 

equivalence   class   of  k-prefixes   all  having  the   same    (k-l)   prefix. 

k— 1 
I   is   the  number  of  equivalence   classes    created  by  =  .      Furthermore, 

E.    f|  E     =   <i>  because   each    (v  ,    D,  )   has   a  unique    (k-l)    -  prefix.      For 
1  j        Y  k'      k 

every   class   E. ,    all  members   have  the   same    (k-l)    -  probability  by 
theorem  10.      For  a  particular  pre-run   of  some    (v    ,    D    )e   E.  ,   the 

K  K  1 

transition   from  level  k-l  to   level  k  of  the   tree  yields    a  sequence   of 
states,   q q.      All  possible   transitions   emanating  from  this 

set   of  states    to  possible  k+1  level  trees  have   a  sum  probability  of 

m  m  m 

Z  Z         En     p        p.     ...p.       where   m.    =        Z       |M(q.,a)| 

k  =1  k  =1  k  =1     kl     k2         Kn  x        a  e    T  X 

12  n 

and  each  value   of  p,       is    associated  with  one   of  the  transitions 

*i 

out  of  state   q. .      Thus   summing  over  all  pre-runs    of  all  k-prefixes 

m  m         n 

the   result   is   I   rf(r,  )   =   E(rf(r        ))    I      . . .    1°        II     p,     ).      The   inner 

k  k"X     k  =1       k  =1  i-1     ki 

1  n 

m  m  m 

sums  can  be  written      Z        p,       (    Z        p,     .  .  .  (    Z        p      )...)   =   1  since  by 

k  =1       1     k0=l       2         k  =1       n 
12  n 

m. 
normalization  the   total  probability  of  leaving  state   q.    =      E        p.       =   1. 

k.=l       l 

i 

Thus  Z  .      u(v,  ,    D   )   =   Z    rf(r,  )    =   Z   rf(r,       )   = 

(vk,    Dk)    .   Bk(A) 

Z         y(v    ,  D   )  and  by  the  induction  hypothesis 

1  »      p(vk    1>   Dk    l'   =   1>   so  Z  a     "(v   ,    D    )   =   1. 

(vi-  \-i^  \-itA)  <vV«VA> 


U6 
Theorem  13   (Kolmogorov) : 

A 

Let  B  (A)  be  probability  spaces,  k  =  0,  1,  2,...;  let 

A 

all  of  these  spaces  be  consistent.   Then  B(A)  forms  a 
probability  space  consistent  with  the  B  (  A).   (In  other 

K. 

words,  a  consistent  specification  of  all  y   determines 

y  uniquely.) 
Lemma  13.1:   Given  any  normalized  PCF  tree  automaton  A  over  T  ,  the 

A 

set  B   (A)    forms    a  probability  space   for  each  k  e    I. 

Proof:  A  probability  space   consists   of  a  set  ft,    a  sigma  field 

§  of  allowable   events  which   can  always  be   chosen   as 
P(ft),   the  power  set   of  ft,   if  ft   is    countable,    and  a 
probability  measure   v. 

Assign   (ft      ,    v    )    as   follows: 

ftk  =  Bk(A) 

t  =  P(\) 

v   (B)   =  E        y(v,  ,   D    )    for  all  B  CB(A) 

(vk,    Dk)    e   B 

The  events  e.  of  ft   are  sets  of  k-prefixes .   By  the  previous  theorem 

V,  (ft,  )   =  E  .      u(v  ,    D,  )   =  1.      Obviously,    v,  (B  U  B' )   = 

kk/p.x^/Vxkk  k 

(V  V  e  Bk(A) 

E  y(v   ,   D   ).      Assuming  Bfl   B'    =   <j> ,   we   can  write 

(vk,    Dk)    e   BU   b'  k       k 

MB  U  B«)    =  E        y(v    ,    D   }   +  E  y(v       D   )   =  v    (B)'  + 

<vV-B  cvDk)eB' 


hi 

vk(B').   Thus  v^  is  finitely  additive.   B  (A)  has  only  a  finite 

number  of  subsets,  so  finite  additivity  is  equivalent  to  countable 
additivity.   These  conditions  verify  that  v   is  indeed  a  probability 

me  as  ure . 

A 

Lemma  13.2:   The  spaces  B  (A)  are  consistent  in  the  sense  that  if 
k  <  £,v  (B)  =  v  (B»)  for  any  B  C  R  (A)  where 

K  Si  —      K 

B'   C  B   (A)    is   the   set   of  all  trees   in 

A 

B£(A)    "3    (v£,   D   )   has    a  k-prefix   (v   ,   D    )   in  B. 
Proof   (of  Lemma  13.2): 

(a)      Assume   %   =  k+1.      The  trees   of  B   can  be  extended  by 

a  single  transition  to  trees   of  length  k+1.      By  theorem  12, 

(v  D        )    '   B        (1)    y(Vk+1'    Dk+l)    = 

•Vi'  Dk+ij  c  Bk+i(A) 

2  a      y(v   ,    D   ).      An   analogous   argument 

(vk,    Dk)    e   Bk(A)  k       >= 

A 

establishes  that  by  replacing  the  set  B    (A)  by  B' ,  the 

theorem  still  holds.   Thus,  v  (B)  =  v  (B* ) 

k  £ 

(b)      In   case  °°   >   I   >  k+1,    apply  part    (a)    I   -  k  times. 

Proof   (of  Theorem  13): 

00 

Choose   Q  =  T   .      The   set   of  all  trees  k-equivalent  to    (v,D) 
is    called  the  Borel   cylinder   over   (v   ,   D    ).      Define   the 

probability  of  this  Borel    cylinder  as  v {  (v1  ,   D')|(v',   Df )  = 
(v,D)}   =  u    (v,D).      Choose   as  t    the   smallest   sigma  field  over 


T     containing  all  Borel   cylinders.      Specification  of 
probabilities   of  all  Borel   cylinders   for  all  k   completely 
specifies   v  on  T    .      Since  k-equi valence   is   an  equivalence 
relation    (theorem  9)»   the   definition  does  not   depend  upon 
the  particular   (v,D)    in  the   equivalence   class   as 
representative.      This   and  consistency  of  all  B   (A)    assure 

that   v   is  well-defined.      It  yields   values   for  all  Borel 

cylinders    and  therefore   for   all  measurable   subsets   of  T°°. 

Finally  v  really  is   a  probability  measure  because  v   is 

countably  additive   and  v(fi)   =   v(B(j£))   =  U-jB,  (A))   =   1. 

K  k 

There  is  a  correspondence  between  PTA's  and  Greibach  normal 

form  P grammars  which  becomes  apparent  by  considering  state  diagrams. 

Each  vertex  q  corresponds  to  a  nonterminal  A  except  for  q_^ .   Each  cable 

leaving  q  corresponds  to  a  production  with  A  as  generatrix.   The 

various  states  to  which  the  arcs  go,  correspond  to  the  nonterminals  in 
the  replacement  string  of  the  production.   The  terminal  symbol  and 
probability  of  the  production  are  the  labels  of  the  corresponding  cable. 
In  the  example  given  of  a  state  diagram,  the  corresponding  P  grammar 
would  be  G  =  (N,  P,  A)  where  N  =  (A,  B},  A  =  ( 6  ,  5„)  =  (l,  0),  and 

A      D 

p       p1       Pb      V-L  ' 
P  =  (A — a**aA,  A— ^aAB,  B — »*bB,  B HdAB).   By  changing  the  requirement 

of  all  derivations  proceeding  from  left  to  right  to  a  requirement  that 

derivations  from  all  nonterminals  in  a  string  occur  simultaneously,  the  P 

grammar  in  Greibach  normal  form  becomes  a  generator  of  trees  where  a 


production  A-i».aB  B  ...B  generates  a  node  with  n  branches 

(a,  p) 


n  branches 
The  P  language  generated  is  accepted  by  the  PTA  given  in  the  above 
correspondence.   This  correspondence  also  shows  that  tree  automata  can 
be  viewed  as  acceptors  of  strings.   If  a  set  B  C  T  is  In  the  behavior 

of  A,  the  corresponding  set  of  strings,  x  e  T  ,  is  found  by  forming 

[2] 
parenthesized  strings  from  trees  (see  Brainerd    )  and  then  removing 

the  parentheses.   Alternatively,  the  process  can  be  described  as 

follows:   For  each  (v,D)e  T  ,  add  zeroes  to  the  right  end  of  each 

d'  e  D  until  it  has  length  £(d' )  =  max  £(d).   Consider  these  strings 

d  e  D 

as   integers   and  order  them  d   ,    d   , ...,    d     so  that   d  <   d.    for 

0    1        K  1—1     1 

i  =  1,  2,...,  k.   Then  form  x  =  (v(d  ),  v(d  ),...,  v(d  ))  as  the 

corresponding  string.  The  following  is  an  example  of  this  process. 
Example  5: 

Valued  Tree  Underlying  Tree 

b  A 


A 


^b  slC  ^2 


xx\ 


b'  Nc  11   12 

b  111 

D  =  {A,  1,  2,  11,  12,  111} 

v(A)  =  b,  v(l)  =  c,  v(2)  =  b 

v(ll)  =  b  v(!2)  =  c,  v(lll)  =  b 
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Expanding  D  yields   000,  100,  200,  110,  120,  111  . 
Ordering  this  set  yields   000,  100,  110,  111,  120,  200  . 

X 

The  corresponding  string  is  ("b 


*     i     ♦     *     ♦, 

c    d    b    c    b) 


Thus ,  the  sets  of  strings  accepted  by  tree  automata  over 
are  exactly  the  context  free  languages.   Next  we  state  this 
correspondence  formally. 

.  A 

Theorem  14:   For  every  P  grammar  G  in  Greibach  normal  form,  there  is  a 

A  A 

PTA  A  which  accepts  the  P  language  generated  by  G  and 

A  A 

conversely,  for  every  PTA  A,  there  exists  a  G  in  GNF  which 

A 

generates  the  P  language  accepted  by  A. 
Proof:       The  proof  is  a  construction  identical  to  that  of  theorem  3 
using 

(1)  Q  =  N 

(2)  £  =  A 

(3)  qB   qB  ...q   e    M(qA,  a)  with  p    (q    q   ,..q   )  -  p 

1   2     n  *'    1   2     n 

iff  (A-4-aB,  B„...B  )e  P 
12    n 

(4)  A  e  M(q,a)  with  p    (x)  =  p  iff  (A^-a)e  P 

q,a 

(5)  M(q,a)  =  <J>  iff  there  are  no  productions  A-^aX  in  P 
for  all  X  e  N*. 

Corollary  14.1: 

V  normalized  P  grammar  G  in  Greibach  Normal  Form  (GNF),  3 
normalized  PTA  A  3  L(.G)  =  L(A)  and  conversely,  V  normalized 
PTA  A,  3  normalized  G  in  GNF  3  L(A)  =  L(G). 


ST 
Corollary  3.U.2: 

V  admissible  P  grammar  G  in  GNF,   3  terminating  PTA  A 

,  A.        .  A  A 

3  L(G)  =  L(A)  and  conversely,  V  terminating  PTA  A, 
3  admissible  G  in  GNF  3  L(A)  =  L(G) . 
Corollary  lU.3: 

A 

V  generalized  admissible  G  in  GNF,  J  complete 

A        A  .        .A  .  A 

A  3  L(G)  =  L(AJ  and  conversely,  y  complete  A, 
3  G  in  GNF  3  l(A)  =  L(G). 
Proof:       These  corollaries  follow  immediately  from  the  construction 
in  the  proof  of  theorem  lU. 
Define  the  P  language  accepted  by  the  (strictly  nonterminating) 

A  00 

probabilistic  tree  automaton  A  as  the  probabilistic  language  (T  ,  y) 

A  A 

where  A  determines  y   for  all  k,  and  these  determine  y  on  B(A);  define 

it 

.oo  A  . 

y(T     -  B(A)   =  0. 

Definition:      The  union  with  weighting  vector  w  of  the   P   languages 
L.    =   (T    ,   y.),    i   =   1,...,   n  is   L  =    (T    ,    y)   where 

n 
y    =      £      w.    y .  . 
i=l     X      X 


Theorem  ik:      The  union  with  weighting  vector  w  of  P  languages   L   , 

A  A  -A 

L^,...,   L     forms   a  P  language.      Furthermore,    if  L.    is 
2  n  do  1 

normalized,    i   =  1,...,   n,    and  w  is   a  stochastic   vector, 

a  n  A 

then  the  union  L  =      U    w.    L.    is  normalized. 

i=l     X      X 
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Proof: 

(a)   Let  B  ,  B  ,...  "be  a  countable  number  of  disjoint  measurable  sets  of  T°°, 

oo  n.  oo 

y(UB.)=zw-y-(UB.)  "by  definition  of  y. 

j=l     J  i=l     X      x   j=l     J 

n  oo 

=     i     w-      Z      y.(B.)  by  the   countable   additivity 

i=l     X   j=l     X      J 

of  pi. 

oo  n 

=     Z  E     w.    y.(B.)  by  algebraic  manipulation. 

j=l     i=l     X      X      J 


=     Z      y(B.)  by   definition. 

j=l  J 

oo  A 

Thus   y   is   a  measure   on  T  .    Thus   L  is   a  P  language. 


(b)      y(oJ   =     i     w.    y .  (pj  by   definition. 

i=l     X      X 

n 
=     Z     w.  since  y.(ft)   =   1,   i  =  1,   2,...,  n, 

i=l 

=  1  since  w  is  stochastic* 

If  (v^Dj)  and  (v  ,  D  )  are  trees,  then  ^((v  ,  D  ),  (v  ,  Dg))  is 

a  tree  whose  root  has  value  a  and  with  two  branches  going  out  to  (v  ,  D  ) 

and  (vp,  D  )  respectively. 


(v^  D1)    (v^D2) 
The  generalization  of  this  operation  is  formalized. 


Definition:      The  concatenation  under  b   of  P  languages   L.    =    (T    ,   y    J, 

i   =  1,...,   n,    is   L  =    (T    ,   y),    defined  as   follows:      for 

1        1        A  n       n 

each   combination   of   (v    ,   D    )e   L    ,...,    (v    ,   D    )e   L    , 

k  k 

define   a  tree    (v,D).      D  =   {A,}  U  ID, j . . .  ynD     where  kD 

means   all  strings   in  D     prefixed  by  k  z   I;    and  define 

k  k 

v(a)   =  b,   v(kd)   =  v   (d)  where   d  e   D    ,   k  e    I.      Define 


uk(v,D)   =     n     y^_x   (v1,   D1),   k  =  1,   2,..., 
i=l 


and 


yQ(v,D)  =  1.   Also  define  yk  ((v,D)  +  (v\  D'))  = 
Pk(v,D)  +  Uk(v',  D'),  provided  (v,D)  ±    (v1  ,  D'  ).   This 
guarantees  that  y   is  finitely  additive.   By  simply 
omitting  the  definitions  for  y  ,  concatenation  under  b 
can  be  defined  for  nonprobabilistic  languages. 

A  A  7\ 

Theorem  15:      The   concatenation  under  b   of  P  languages   L    ,   L    , .  ..,    L    , 

A 

forms   a  P  language   for  any  b  e   T.      Furthermore,    if  L.    is 


Proof: 


normalized,  i  =  1,...  n,  then  the  concatenation 

A  A  A 

L  =  bo(Ln,...,  L  )  is  normali ze d . 
1       n 

The  k-probabilities   of  trees,   y    (v,D),   were   defined  as 


n 


II     y      ,(v   ,   D   ),    so   consider  the  total  sum  for  some  k. 
i=l     k_1 


If  k  =  0,   the   sum  is   1.      If  k  >  0,   then  yn  (ft)   =  y(ft.,  )   = 

k  k 


5h 


y(v,  ,   D   )   since   y     was  previously  defined 


(vk,  Dk)  e  ^ 


so  as  to  "be  finitely  additive  on  the  finite  set  o,  .  u,(n)  = 


(\-r  Dk-i}  e  <£■: 


/   n         r^n      s  n 

(Vi*  \-i}  £  nk-i 


n 


i=l 


n 

=    n 


n 


E     .         .  .        =     n     i  =  1 

1=1  (\-i'  Dk-i}  e  nk-i  ^(vLr  Dk-i}      i=1 


assuming  each  y      is   a  probability  measure.      Thus   each 

(ft    ,   y,  )    forms   a  probability  space.      By  theorem  13,    a 
k       k 

A 

consistent  specification  of  all  y   determines  y.   L  is 
therefore  an  NP  language. 

->.  AAA 

Definition:      The   direct  y-sum  of  the  PTA's   A. ,   A0  , .  .  .  ,   A   ,   where 

12  n 

A.    =   (Q.  ,   M.  ,    5.),   is   the  PTA  A  =  £pU.    =    (Q,  M,    S) 
i  ill  \^v/    i 

n 
where   Q  =     U     Q.    (assuming  without   loss   of  generality  that 
i=l     x 

all  Q.    fl  Q.    =   <t>  if  i   ?*   .1),  M(q,a)   =  M. (q,a)    and  p  =  p1 

i    j  i  q,a    q,a 

for  all  q  e  Q.  ,  (p    is  the  probability  associated  with 

l  q,a 

M.  )    and  c(q.  )   =  w.    S.(q.  )• 
i  l  ill 


AAA  A 

Theorem  l6:      If  A   ,   AA,...,   A     are  normalized  PTA's,   then  A  =(-h")  A. 
1'      2  '      n  VIW/     l 


Proof: 


is  normalized. 

One  only  needs  to  show  that  5  is  a  stochastic  vector, 
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5  "  (wl    6l<*u}'  wi  ^(a12),...,  wi  51Cq1  fc  ), 

W2   V^l^*"    4n(qn  k    ^   Where  qi1    £   \   and 

n  J 

k.    =    |Q.| ,1-1,   2,...,   n. 

n  n 

Then  |s|     -        e        ^(q)   -     Z  S  v.    5.(q      )   =      Z 

q  e    Q  i=l  qid    e   Q.  1J  i=l 

n 

w.(       z       e. (q.-  •))  ■    £    w.  =  i. 

q. .    e   Q.  1=1 

HiJ  i 

Definition:      The   direct  b-product  of  A   ,  A.,...,  A   ,  b   e  T,  where  A.    is   the 
PTA,   L    =    (Q.,   M.,   H.Xis  i  S(W\   =    (Q»   M>    5)   where 


=    u    Qtu  {qQ}  with  qLq  i  Qx»  and  Qi  n  Q.  =  4>, 


n 

U 

i=l 

i,j=l,...,   n,i/j,       $(a)    =   1,    £(q)    =   0   if  q  ^   qQl 


M(q,a)    =  M.(q,a)    and  p   a(QR)    if  q   e    Qi ;   M(qQ,    a)    =   <j>    if  a  7*  b;    for 
all  possible   combinations   q     q0...q     such  that  q      e  Q 
and    Ci(qi)    >  0,   Q.-L  <12  •  •  •  0^    e  M(qQ>   b)    and 

H0  i=l 

Theorem  IT:      If  A   ,   L...,   A     are  normalized  PTA's,    then  A  =  A£)  \ 

is  normalized. 
Proof:  By  definition    H  is   a  stochastic  vector,    and  since   the   state 

sets   Q.    are   disjoint,   each  q.  e     Q     satisfies   the  normalization 

l  ii 
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criterion  because  A.  is  normalized.   Finally, 

£  p  U   .  ..q    )   = 

q1...qn   e    M(q        b)      V*         X        ^ 


n 
^  n      £.(q.)   = 


qx   e    Qx      q2   e    Q2  a^   e    Qn      i= 


=1 


l      l 


I 


5i(qi)(     i      c2U2)..-(     i      5n(%)...)  =  i 

-L  n  _     E     (3-  n        e      Q 


11  «2  e   Q2  <*n  e   Si 

since  E  £.(q. .)   =  1  for  each  Q.. 

Theorem  18:     3    an  operator  homomorphism  h   from  the   set  of  strictly 

nonterminating  PTA's   over  T  into  the   set   of  P  languages   of 
the   form  (T    ,    y)    under  the   operations    (r4f)  5  v£/  )    and. 


(\y)  ,     |0J    )  where    \w)    means   union  with  weighting  vector  w 

and    lOy     means    concatenation  under  b. 
b 

Proof: 

(.a)   Define  for  every  PTA  A,  h(Aj  as  the  language  (T  ,  y)  which  is 
accepted  by  A.   The  restrictions  of  complete  and  strictly  non- 
terminating  guarantee  that  A  accepts  some  set  of  infinite  trees. 
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these  distributions  are  equivalent  on  all  Borel  cylinders,  so 

1    2 
they  are  equivalent  on  all  measurable  subsets  of  T00.   p  =  yi» 

so  every  PTA  accepts  exactly  one  language. 

(b)   If  A.  accepts  (T°°,  / ) ,  (T°°,  y)  =  \v)    (T°°,  /),  i  =  1,...,  n,  then 

n 
by  definition  y  (v,D)  =   £  w.  y  (v,D)  for  all  trees  (v,D).   The 


i=l 
probability  of  acceptance  by( 

Z  rfVr) 

r  £  RnA(rk>   D    ) 


A.    of  an  arbitrary  Borel   cylinder 
fw/    i 


IS 


n 
=     £  w.  £  rf^(r) 

i=lXr  e  RnA    (vfcJ   Dfc) 
i 


n 
=      £     w  £  C(r(A))        n  P^    (t) 

i=l         r  e   RnA    {v^   Dfc)  t  e  t(r)        i 

i 


n 
=      £ 


rfA    (r) 


i=l     X   r  e  RnA    (v   ,   d   )        Ai 
i 


=        Z        W,      yJ(v,D). 

i=l 


i   V 


Since  this    calculation  was    carried  out   for  an   arbitrary  tree    (v,D)   this 
implies   that   the   direct   sum  of  P  automata,£-.rHA.  ,    accepts   the   union 

with  weighting  vector  w  of  the  P  languages   generated  by  the  P  automata, 
(c)      Suppose    (v    ,   D   )    is   an   arbitrary  cylinder  which  has   a  prerun  on  the  PTA 

A  =   I Y  i  A. ,i=l,...,n.      Any  prerun  has   a  first  transition   from 


q_   to  q     q~...q»   q.    e   Q.      The  probability  of  this   is 
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n 

n   £.(q.).   After  this,  the  other  transitions  out  of  state  q. 
i=l  1  x  X 

t 

correspond  to  a  prerun  on  A. ,  so  the  prerun  r.  must  accept  a  prefix 

(v,  ,  ,  D,  _,  )  contained  in  L.  .   The  probability  of  this  prerun  on  A. 
k-1   k-1  1  1 

is  rfA   (r.).   The  total  probability  associatiated  with  the  prerun  is 

i 


H      £.(<!.)»      n     rf/s      (r.).      Summing  over  all  preruns  yields 
.    ,      l      l         .    ,        A.         l 
i=l  i=l  l 

n 
y(v    ,   D   )   =     n      y    (y        ,    D        ).      This   is,   by   definition,   the  probability 

K.    K      .   n       K—X  K.— 1 

1  =  1 

associated  with  the  prefix  (v  ,  D  )  contained  in  L  =  bo(L  ,  L  , ...,  L  ). 

k.       k  i       c.  n 


Thus,    (b)    and   (c)   have   shown  that  h(  Q><h)  A.  )   =  \wj      (h(A.  )),    and 

h( 


1)  =  /a)  (h(A.)) 

i  b  i 
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7.   SUMMARY  AND  CONCLUSIONS 
General  definitions  have  been  given  of  the  concepts  of 
Probabilistic  Language,  Probabilistic  Grammar,  and  Probabilistic  Automaton. 
The  set  of  context  free  probabilistic  languages  has  been  completely 
characterized  in  terms  of  (l)  probabilistic  context  free  grammars, 
and  (2)  probabilistic  tree  automata.   The  motivation  for  this 
work,  namely,  to  take  a  first  step  toward  developing 

a  quantitative  tool  for  analyzing  programming  languages  and  their  translators, 
was  explained  in  the  Introduction.   We  would  like  to  conclude  by  mentioning 
some  applications  of  and  problems  related  to  the  present  investigations. 

In  the  area  of  compiler  writing  for  programming  languages,  there 
are  various  syntax  oriented  methods  available  which  effectively  derive  code 
from  the  grammar  which  specifies  the  language.   By  keeping  a  frequency  count 
of  the  amount  of  usage  of  the  various  "sentences"  of  the  language,  where  the 
sentences  in  this  case  are  programs,  a  probabilistic  language  can  be  defined 
and  it  can  be  determined  which  of  several  probabilistic  grammars  generates  a 
better  approximate  language.   Furthermore,  it  can  be  speculated  that  an 
adaptive  compiler  (i.e.,  a  learning  compiler)  which  modifies  itself  to 
translate  high  frequency  sentences  best  may  be  obtainable  by  improving 
(altering)  the  approximating  grammar  as  more  and  more  frequency  data  is 
obtained. 

Probabilistic  languages  are  also  particularly  well  adapted  to 
problems  such  as  library  information  retrieval,  in  that  a  machine  based  upon 
the  concepts  of  probabilistic  languages  will  assign  a  probability  to  the 
relevance  of  any  description  to  any  document.   This  machine  can  then  be 
asked  for  documents  satisfying  certain  requirements  with  the  user  assigning 
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weights  to  the  importance  of  each  requirement .   The  machine  can  then  give 
a  list  of  documents  as  output  in  order  of  probability  that  they  will  satisfy 
the  desired  requirements.   Such  a  machine  would  save  the  user  much  time 
when  the  bibliographies  were  very  large.   Also,  the  machine  would  list  the 
documents  most  likely  to  be  relevant,  even  if  this  likelihood  were  extremely 
low.  A  machine  using  nonprobabilistic  languages  would  often,  in  similar 
cases,  list  no  documents  at  all. 

One  of  the  interesting  areas  related  to  sets  of  trees  concerns 
decidability  questions.   The  effective  procedure  which  can  be  given  for 
checking  equality  of  two  tree  automata  implies  that  it  is  always  decidable 
whether  the  sets  of  trees  generated  by  two  context  free  grammars  are  the 
same  or  not.   If  the  sets  of  trees  generated  by  two  context  free  grammars 
are  not  the  same,  this  does  not  imply  that  the  sets  of  strings  generated  by 
the  grammars  are  not  the  same;  similarly,  if  the  two  sets  of  strings  generated 
are  identical,  this  does  not  imply  that  the  sets  of  trees  generated  are  the 
same.   Indeed,  this  question  is  known  to  be  undecidable  for  context  free 
sets  of  strings.   Thus,  we  say  that  the  degrees  of  unsolvability  of  these 
two  questions  are  incomparable . 

Finally,  let  us  mention  one  possible  generalization  of  the  present 
investigation.   Can  one  define  a  Probabilistic  Turing  machine  in  an  analogous 
fashion  to  previous  definitions?  What  are  the  characteristics  and  applications 
of  this  machine  and  the  probabilistic  language  accepted  by  it,  if  defined?  A 
first  step  toward  answering  these  questions  appears  in  the  Appendix,  but  this 
area  is  otherwise  unexplored. 
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APPENDIX  A 

APPROXIMATION  OF  PROBABILISTIC  TURING  AUTOMATA 

BY  PROBABILISTIC  PUSHDOWN  AUTOMATA 
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A  Probabilistic  Pushdown  Storage  Automaton   (called  a  pushdown 
P  automaton)    consists   of  a  finite  state   control  unit,  two  one-way 
infinite   tapes    (an  input  tape   and  a  storage  tape),    a  read  head  for  the 
input  tape,   and  a  read-write  head  for  the  storage   tape.      The  machine 
operates  by  changing  states   at  discrete  time  intervals.      The  new  state 
and  the   string  of  symbols  printed  on  the   storage   tape   at  time  t+1 
depend  probabilistically  upon  the   old  state   and  the   symbols   read  on  the 
two  tapes   at   time  t.      This    can  be   stated  formally. 

A 

Definition:      A  Probabilistic  Pushdown  Storage  Automaton  A  over  T  is   a 
system  (Q,   M,    S,    5)   where  Q  is   a  finite   set   of  states, 
{q     q_0 ...  a    }  „   S  is    a  finite   set   of  storage   tape   symbols, 

{sn    s^...s    },  with  s     =   #,      H  is   an  n-dimensional  initial 
12m  m 

state  vector,  and  M  is   a  probabilistic  transition  function. 

A 

A  situation  of  A  is  a  triple  (q,  s,  a).  M  is  started  in  the 
situation  (qn>  #»  a  )  where  £(qn)  >  0,  and  a  is  the  first  symbol  (not 

including  #)  of  some  string  x  e  T  which  is  printed  on  the  input  tape. 

A 

Pictorially,  A  has  the  following  initial  configuration. 


( 


# 

§ 

Then,  using  the  terminology  of  Haines    ,  (q' ,  s1)  e  M(q,  s,  a)  will  be 
written  as  (q,  s,  a)~^(q'  ,  s')  where  p  is  the  probability  associated 
with  the  transition.   Only  three  types  of  instructions  are  allowed: 
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(1)  (q,  s,  q  )  » ( i '  i  s').   This  means  if  the  automaton  is  in  the  state 

q,  and  scanning  symbols  s  and  a.  on  the  storage  and  input  tapes 

respectively,  then  with  probability  p,  a  transition  into  state 
q'  will  occur,  and  the  storage  and  input  tapes  will  move  one 
square  to  the  left  and  then  s'  e  S  -  {#}  will  be  printed  on  the 
storage  tape  one  square  to  the  right  of  s. 

(2)  s'  =  A  implies  the  storage  tape  is  left  unchanged  and  unmoved,  so 

(q',  s)  e  M(q,  s,  a.).   The  next  situation  is  (q' ,  s,  a   ) 

l  1+1 

assuming  1  <_  i  <  n  where  the  input  string  is  x  =  a  a  ...a  . 

(3)  s'  =  o  implies  the  symbol  s  is  erased  from  the  scanned  square  of 

the  storage  tape  and  the  tape  is  moved  one  square  to  the  right. 

If  the  string  written  on  the  storage  tape  at  time  t  before  the 

transition  was  s.  sn...s.  s,  s.  £  S,  then  the  string  at  time  t+1 
0  1    k     i  D 

after  the  transition  is  s  s  ...s  .   (q',  s  )  e  M(q,  s,  a. J  and  the 
new  situation  is  (q',  s  ,  a .  ,).   This  is  defined  only  if  k  >_  0. 

K    1  +  1 

If  a  =  X  in  any  of  these  instructions,  then  the  input  tape  is 
left  unmoved  and  the  transition  is  independent  of  the  input  symbol 
scanned,  (q,  s,  X)— ^*.(q',  s')  implies  (q' ,  s')  e  M(q,  s,  a)  for  all 
a  z   T  and  the  next  situation  is  (q1 ,  s',  a). 

A 

A  pushdown  P  automaton  A  terminates   if  it   is   in  a  situation 
(q,   s,   a)   such  that  M(q_,   s,    a)   =   <J>.      Also  if  X  e  M(q,   s,    a)   then  A  may 
terminate   in  acceptance   if  the   read-write  head  is   scanning  the   first 
square  of  the  storage  tape   and  if  during  the   last  transition  the  read 
head  has   just   read  a     and  the  tape  moved.      We   introduce  the   useful 
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notational   symbols   q_^   and  #  to  use   in  instructions,      q     is   the 

fictitious    dead-end  state   discussed  in  Chapter  6    ((q,    s,    a)  ■*■   (q    ,    s') 

means   A   e  M(q,    s,    a)),    and  #  is   a  fictitious  "blank  symbol  written   on 

the   square   to  the   left  of  a^   and  on   all  squares   to  the   right   of  a     on 

0  n 

the   input  tape.      Note   #   j.   T,    so  a  =   #  is   not   allowed  in   any   instruction. 

A   transition  sequence   for  x  e   T+   is   a  sequence   of  situations    (q    ,   s    ,    a   ), 

(q,  ,    sn .   an ) , . . . ,    (q    ,    s    ,    a    )   where  A  is   started  in  an   initial 
ill  n        n       n 

configuration  with   xy  written  on  the   input  tape   for  some   string 

y  e   T*,    and  a  sequence   of  elementary  transitions    consistent  with  M 

P- 
occur,    (q.  ,    s.,    a.) •»  (q        ,    s        ),   where  p.+     is   the  probability  of 

the   transition.      The   situation   after  the   sequence   of  transitions  must 

be    (q,    s    ,    a    )   where   a     =  y,  ,   the   first  terminal   symbol  of  y.      If 
Ti       n       n  n         1 

further,   y  =   A   and   (q  ,    s    ,    a    )    is   the   final  situation,    (q, ,    #,    #),   then 

n   n   n  t 

the  sequence  is  called  an  accepting  transition  sequence.   The  probability 

of  a  transition  sequence  is  the  product  of  the  probabilities  of  the 


elementary  transitions,  p  =   IT  p..   The  probability  of  partial  acceptance 

k   i=l   X 

m 
of  x  is  y*(x)  =   2   £(qn)p,,  where  m  is  the  number  of  transition  sequences 
k=l    °  K 

for  x.   The  probability  of  acceptance  of  x  is 

a 

m(x)  =   Z   £(q  )p  where  i   is  the  number  of  accepting  transition  sequences 

k=l  °      k 

for   x.      The   analogue   of  type   h  normalization    (i.e.,   the   total  probability 

of  leaving  any   state  must   sum  to  l)    is        £  Z  E  E        p[(q,   s,    a) 

seSs'    eSaeTq*    eQ 

(q',    s')]   =   1. 
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l/2jn 

s) 

l/2Jfl 

a) 

Example    ?:        A  pushdown  P  automaton   for  the   language   {a     b    |n   >   0} 
This   automaton  is   type   h  normalized. 

A  =    (Q,   M,   S,    S)    over  T  =   {a,b} 

(1)  Q  =    {qQ,    qx,    q2> 

(2)  S   =    {sj} 

(3)  M  has   instructions: 

(q.0»    #»    a)— ^(q1>    s) 

(q1,   s,    a) 
(q.lS  s,  b) 

(q2,    s,   b) ^(q2,    a) 

(q2,    s,   b) »-(qt»    °) 

(1+)    S      =    (1,    0,    0) 

The   model  of  Turing  Automaton   defined  herein  is    derived  from 
the  1-tape  online  turing  machine  .      A  Probabilistic  Turing  Automaton 

(called  a  Turing  P   automaton)    is   basically   a  pushdown  P   automaton   in 
which  the   storage   tape   is   allowed  to  move   left  and  right  without  erasing. 
More   formally, 

A 

Definition:   A  Probabilistic  Turing  Automaton  A  over  T  is  a  system 
(Q,  M,  S,  5)  where  Q  is  a  finite  set  of  states,  S  is  a 
finite  set  of  storage  tape  symbols,  M  is  a  probabilistic 
transition  function,  M:  Q  x  T  x  sJ*.P(Q  x  S  x  J)  U  M 
where  J  =  {-1,  0,  1}  and  H  is  the  initial  distribution  vector. 
An  instruction  of  A  may  be  written  (q,  s,  a)— *-(q' ,  s',  j)  where 

j=l  indicates  move  the  tape  one  square  to  the  left  and  then  print  s', 
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j=-l  implies  print   s'    and  then  move  to  the  right,    and  j=0   implies 
print   s'   but   do  not  move   the  tape. 

All  other  definitions   and  restrictions   are  the   same   as   for 
pushdown  P  automata.      This  version  of  Turing  automaton   differs   from 
the  usual  in  that  it   is  probabilistic,   it   contains   a  move-bef ore-write 
type   of  instruction,   its   storage  tape   is   only  infinite  to  the  right,   and 
it  must   terminate  by  scanning  the   #  in  the  initial   square.      It  is   easy 
to   see   that  the   latter  three   alterations   do  not   change  the   set  of 
languages   accepted  by  a  Turing  automaton. 
Algorithm:        Given   any  Turing  P  automaton,   A,   the   following  procedure 

A 

yields  a  Pushdown  P  automaton  A' ,  which  accepts  an 
approximation  language  L(A' )  =  (T  ,  y'),  to  the  language 
accepted  by  A,  L(A)  =  (T+,  u).   Let  A  =  (Q,  M,  S,  5) 
where  we  restrict  H  to  be  1  for  one  state  (initial  state 
<lJ  and  0  for  all  others,  then  A1  =  (Q1  ,  M1  ,  S1  ,  =')•   S' 

consists  of  all  of  S  together  with  stack  symbols  ?s  for  ■ 
all  symbols  s  in  S.   M1  can  be  described  by  a  set  of 
instructions: 

(a)  For  each  (q,  s,  a)J^q'  ,  s',  +1)  in  M,  put  (q,  s,  a)-2*4q' ,  s') 
and  (q,  ?s  ,  a)-£*.(q'  ,  s*  )  into  M'  . 

(b)  For  each  (q,  s,  a)  J*-(q' ,  s',  0)  in  M,  put  (q,  s,  a)J*-(qQ,  c), 
(qQ,  A,  A)— *-(q',  s*),  and  (q,  ?s,  a)-E^{q_Qi    o)  into  M1  ,  where 

q  e  Q'  is  a  new  state  which  only  appears  in  these  three  instructions 


(0)   For  each  (4l  s,  .)  J*,. ,  ,r,  ^,  in  M>  put  (q_  g>  a)_p^  ^ 

lV  X,  X)-^.  a),  (^  x,  A)-^.,  „.),  md  (q>  ?s>  a)^^  o) 
into  P.  Put  new  symbols  q±   and  qg  into  Q' . 

(«  For  each  <q,  s,  a)^(q.,  x,   _x)  in  M>  put  (q>  ,_  a)^q,  ^  g)  ^ 

(q.  ?s,  a)-^(q»,  CT)  into  M> 
(e)  For  each  (,,  ..  a)  J^, ,  A,  0)  ln  M>  ^  ^  ^   &)^  ^   ^ 

(l,  ?s,  aJ-JUq.',  A)  into  M. 
W     For  each  (q,  s,  ajJWo."  ,  A,  +1)  ln  M>  ^  {q>  ^  a)_^qij  ^ 
into  M  for  all  symbols  !s  defined  in  S'. 

The  six  instruction  types  cover  all  possible  directions,  with 
and  without  printing,  that  the  Turing  P  automaton  can  move.  Notice  that 
the  pushdown  P  automaton  can  simulate  all  of  these  transitions  except 
-ving  to  the  right  without  printing,  in  which  case  it  prints  ,..   ftus , 
if  type  (f)  instructions  are  not  used  in  the  Turing  P  automaton  program,' 
then  the  P  language  recognized  will  also  be  recognized  by  the  pushdown  P 
automaton  constructed  from  the  Turing  P  automaton  by  the  previous 
algorithm.  Furthermore,  if  the  sqUare  in  which  ?s  would  get  printed  by 
the  pushdown  P  automaton  is  never  revisited  by  the  Turing  P  automaton, 
then  the  approximation  is  again  exact. 

Definition:   The  (^-normalized)  Initial  Definite  P  language,  ^  of 
a  pushdown ^P  automaton  «■  which  approximates  a  Turing  P 
automaton  a"  is  the  set  of  all  strings  x  which  are  maximal 
initial  definite  segments  of  strings  z  c   T*  3   p(z)  >  o. 
The  probability  assigned  to  x  is  the  probability  of  partial 


TO 

acceptance  with  respect  to  A,  y   (x)  =  y*(x).   All 
other  elements  y  of  T*  have  y   (y)  =0.   x  is  maximal 

initial  definite  if  it  fulfills 

(1)  initial:  3   y  e  T*  3- xy  =  z,  y(z)  >  0 

(2)  definite:  -J  a  transition  sequence  for  x  with  respect  to  A'  which 
contains  no  (a,  q,  ?s)— ±-*^q' ,  3')  type  instructions  (called 
indefinite  instructions ) . 

(3)  maximal:  -^  a  transition  sequence  for  x  fulfilling  (2)  above  which 
can  be  extended  to  an  accepting  transition  sequence  for  z  with 

A 

respect  to  A'  such  that  the  first  instruction  after  the  initial 

definite  segment  x  is  indefinite,  (q,  ?s,  a)  ■>  (q',  s'). 

Define  lid  =   I  y   (x).   Define  L   state  set  as  the 

x  e  T+ 

set  of  states  of  Q'  in  which  the  pushdown  P  automaton  can 

reside  after  strings  x  3-  y   (x)  >  0  have  been  input. 

A  A 

Theorem:     Let  A  be  a  Turing  P  automaton.   Let  A'  be  the  approximating 
pushdown  P  automaton  as  described  above.   If  the  set  of 

A 

states  of  A'  reachable  from  the  Ly   state  set  all  have  type 

k   normalization  (where  q  reachable  means  there  is  a  transition 
sequence  such  that  q^  =  q,  and  q_  e  L   state  set)  then  the 

following  error  bound  holds: 

E    |y(z)  -  y'(z)|<  lid  -   I  y(.z) 

z  e  T+  z  e  T+ 

Proof:       Consider  a  Markov  chain  whose  states  are  the  states  of  the 

A 

automaton  A' ,  and  whose  transition  probabilities  p. .  are  just 
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the  probabilities  p. ,    of  a  transition  from  state  q.  to 

q  .   Take  as  initial  state  any  q.  e  L   state  set.   By 

J 

normalization,    E  p   =  1. 

It  is  then  possible  to  compare  y   (x)  to  y'(z)  for  all  strings 

A 

z  e  L(A'  )  3  z  =  xy  for  some  string  y.   y   (x)  >_   £    y ' (z) . 

z  =  xy 


The  inequality  is  not  a  strict  equality  because  we  have  not 
excluded  anomalies  such  as  instructions  which  lead  to  dead- 
end, nonaccepting  states.   Since  all  accepting  transition 

A 

sequences  with  respect   to  A'   begin  with  transition  sequences 

for  some   x  with  uID(x)    >  0,       2  y      (x)    >        E  y'(z). 

X  Z   LID     ID         ~  z  e   T+ 

lid>_       I  u'lz),        E      Jy(z)-y'U)|=        I  y'(z)-      [z) 

zeT  zeT  zeT 

because   each  string  of  this   is   accepted  under  this   approximation 

with  probability  >_  y(z). 

=        E      +  y '  (z)   -        I      +  y(z) 
zeT  zeT 

=   lid  -        Z  y(z) 

zeT 

=  lid  -   1  if  Lift1)    is   an  NP  language. 

Thus,    if  error  of  approximation  is   measured  by 

E       J  y I z)    -  y'(z)|,   then   an  error  bound  is   lid  -  1. 
zeT 
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APPENDIX  B 
EXAMPLES  OF  REGULAR  TREE  EXPRESSIONS 
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Expression   conventionally  written    (a  U  b)*c   is  written 

a  <to>  U   b   <u)>  \j  c 

The  context  free  grammar  of  example  7  is 

A  ■*>  bAB,   A  -*■   bB,   B  -v  b 

A  +  cAC,   A  ->  cC,   C  ->-  c 

The   corresponding  RTE  is 

b   <[u)]*[b]>   +   c   <[w]«[c]>   +  b   <[b]>   +   c   <[c]> 

The  PTA  of  example    k  is 


a,p 


The    corresponding  RTE  is 

(l)  (0)  ■«•—    superscript  probabilities 

a  <[wl]*[b   <    [wl]»[u2]>  +  b   <[a>l]>]>   +   a  <[wl]> 

« —   subscript  vectors 


(p;>     K> 


(v 


<*a> 
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